Understanding the Problem Statement and Business Case

  • We live in a world where we are constantly bombarded with social media feeds, tweets, and news articles.
  • This huge data could be leveraged to predict people sentiment towards a particular company or stock.
  • Natural language processing (NLP) works by converting words (text) into numbers. These number are then used to train an AI/ML model to make predictions.
  • AI/ML based sentiment analysis models, can be used to understand the sentiment from public tweets, which could be used as a factor while making a buy/sell decision of securities.

Import Libraries/Datasets and Performed Exploratory Data Analysis

In [1]:
# import key libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud, STOPWORDS
import nltk
import re
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

# Gensim is an open-source library for unsupervised topic modeling and natural language processing
# Gensim is implemented in Python and Cython.
import gensim
from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import STOPWORDS
import plotly.express as px

# Tensorflow
import tensorflow as tf
from tensorflow.keras.preprocessing.text import one_hot,Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Embedding, Input, LSTM, Conv1D, MaxPool1D, Bidirectional, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.utils import to_categorical

import string
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from transformers import pipeline
In [2]:
# Loaded the stock news data
stock_df = pd.read_csv("D:\Python and Machine Learning for Financial Analysis\stock_sentiment.csv")
In [3]:
# Let's view the dataset 
stock_df
Out[3]:
Text Sentiment
0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... 1
1 user: AAP MOVIE. 55% return for the FEA/GEED i... 1
2 user I'd be afraid to short AMZN - they are lo... 1
3 MNTA Over 12.00 1
4 OI Over 21.37 1
... ... ...
5786 Industry body CII said #discoms are likely to ... 0
5787 #Gold prices slip below Rs 46,000 as #investor... 0
5788 Workers at Bajaj Auto have agreed to a 10% wag... 1
5789 #Sharemarket LIVE: Sensex off day’s high, up 6... 1
5790 #Sensex, #Nifty climb off day's highs, still u... 1

5791 rows × 2 columns

In [4]:
# dataframe information
stock_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5791 entries, 0 to 5790
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Text       5791 non-null   object
 1   Sentiment  5791 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 90.6+ KB
In [5]:
# checked for null values
stock_df.isnull().sum()
Out[5]:
Text         0
Sentiment    0
dtype: int64
In [6]:
sns.countplot(stock_df['Sentiment'])
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x269019bf708>
In [7]:
# Found the number of unique values in a particular column
stock_df['Sentiment'].nunique()
Out[7]:
2

Performed Data Cleaning (Removed Punctuations from Text)

In [8]:
string.punctuation
Out[8]:
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
In [9]:
Test = '$I love AI & Machine learning!!'
Test_punc_removed = [char for char in Test if char not in string.punctuation]
Test_punc_removed_join = ''.join(Test_punc_removed)
Test_punc_removed_join
Out[9]:
'I love AI  Machine learning'
In [10]:
Test = 'Good morning beautiful people :)... #I am having fun learning Finance with Python!!'
In [11]:
Test_punc_removed = [char for char in Test if char not in string.punctuation]
Test_punc_removed
Out[11]:
['G',
 'o',
 'o',
 'd',
 ' ',
 'm',
 'o',
 'r',
 'n',
 'i',
 'n',
 'g',
 ' ',
 'b',
 'e',
 'a',
 'u',
 't',
 'i',
 'f',
 'u',
 'l',
 ' ',
 'p',
 'e',
 'o',
 'p',
 'l',
 'e',
 ' ',
 ' ',
 'I',
 ' ',
 'a',
 'm',
 ' ',
 'h',
 'a',
 'v',
 'i',
 'n',
 'g',
 ' ',
 'f',
 'u',
 'n',
 ' ',
 'l',
 'e',
 'a',
 'r',
 'n',
 'i',
 'n',
 'g',
 ' ',
 'F',
 'i',
 'n',
 'a',
 'n',
 'c',
 'e',
 ' ',
 'w',
 'i',
 't',
 'h',
 ' ',
 'P',
 'y',
 't',
 'h',
 'o',
 'n']
In [12]:
# Joined the characters again to form the string.
Test_punc_removed_join = ''.join(Test_punc_removed)
Test_punc_removed_join
Out[12]:
'Good morning beautiful people  I am having fun learning Finance with Python'
In [13]:
# Let's define a function to remove punctuations
def remove_punc(message):
    Test_punc_removed = [char for char in message if char not in string.punctuation]
    Test_punc_removed_join = ''.join(Test_punc_removed)

    return Test_punc_removed_join
In [14]:
# Let's remove punctuations from our dataset 
stock_df['Text Without Punctuation'] = stock_df['Text'].apply(remove_punc)
In [15]:
stock_df
Out[15]:
Text Sentiment Text Without Punctuation
0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... 1 Kickers on my watchlist XIDE TIT SOQ PNK CPW B...
1 user: AAP MOVIE. 55% return for the FEA/GEED i... 1 user AAP MOVIE 55 return for the FEAGEED indic...
2 user I'd be afraid to short AMZN - they are lo... 1 user Id be afraid to short AMZN they are look...
3 MNTA Over 12.00 1 MNTA Over 1200
4 OI Over 21.37 1 OI Over 2137
... ... ... ...
5786 Industry body CII said #discoms are likely to ... 0 Industry body CII said discoms are likely to s...
5787 #Gold prices slip below Rs 46,000 as #investor... 0 Gold prices slip below Rs 46000 as investors b...
5788 Workers at Bajaj Auto have agreed to a 10% wag... 1 Workers at Bajaj Auto have agreed to a 10 wage...
5789 #Sharemarket LIVE: Sensex off day’s high, up 6... 1 Sharemarket LIVE Sensex off day’s high up 600 ...
5790 #Sensex, #Nifty climb off day's highs, still u... 1 Sensex Nifty climb off days highs still up 2 K...

5791 rows × 3 columns

In [16]:
stock_df['Text'][2]
Out[16]:
"user I'd be afraid to short AMZN - they are looking like a near-monopoly in eBooks and infrastructure-as-a-service"
In [17]:
stock_df['Text Without Punctuation'][2]
Out[17]:
'user Id be afraid to short AMZN  they are looking like a nearmonopoly in eBooks and infrastructureasaservice'

Removed Punctuations using a Different Method

In [18]:
Test_punc_removed = []
for char in Test: 
    if char not in string.punctuation:
        Test_punc_removed.append(char)

# Joined the characters again to form the string.
Test_punc_removed_join = ''.join(Test_punc_removed)
Test_punc_removed_join
Out[18]:
'Good morning beautiful people  I am having fun learning Finance with Python'

Performed Data Cleaning (Removed Stopwords)

In [19]:
# downloaded stopwords
nltk.download("stopwords")
stopwords.words('english')
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\kusha\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Out[19]:
['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each',
 'few',
 'more',
 'most',
 'other',
 'some',
 'such',
 'no',
 'nor',
 'not',
 'only',
 'own',
 'same',
 'so',
 'than',
 'too',
 'very',
 's',
 't',
 'can',
 'will',
 'just',
 'don',
 "don't",
 'should',
 "should've",
 'now',
 'd',
 'll',
 'm',
 'o',
 're',
 've',
 'y',
 'ain',
 'aren',
 "aren't",
 'couldn',
 "couldn't",
 'didn',
 "didn't",
 'doesn',
 "doesn't",
 'hadn',
 "hadn't",
 'hasn',
 "hasn't",
 'haven',
 "haven't",
 'isn',
 "isn't",
 'ma',
 'mightn',
 "mightn't",
 'mustn',
 "mustn't",
 'needn',
 "needn't",
 'shan',
 "shan't",
 'shouldn',
 "shouldn't",
 'wasn',
 "wasn't",
 'weren',
 "weren't",
 'won',
 "won't",
 'wouldn',
 "wouldn't"]
In [20]:
# Obtained additional stopwords from nltk
stop_words = stopwords.words('english')
stop_words.extend(['from', 'subject', 're', 'edu', 'use','will','aap','co','day','user','stock','today','week','year'])
# stop_words.extend(['from', 'subject', 're', 'edu', 'use','will','aap','co','day','user','stock','today','week','year', 'https'])
In [21]:
# Removed stopwords and remove short words (less than 2 characters)
def preprocess(text):
    result = []
    for token in gensim.utils.simple_preprocess(text):
        if len(token) >= 3 and token not in stop_words:
            result.append(token)
            
    return result
In [22]:
# Applied pre-processing to the text column
stock_df['Text Without Punc & Stopwords'] = stock_df['Text Without Punctuation'].apply(preprocess)
In [23]:
stock_df['Text'][0]
Out[23]:
'Kickers on my watchlist XIDE TIT SOQ PNK CPW BPZ AJ  trade method 1 or method 2, see prev posts'
In [24]:
stock_df['Text Without Punc & Stopwords'][0]
Out[24]:
['kickers',
 'watchlist',
 'xide',
 'tit',
 'soq',
 'pnk',
 'cpw',
 'bpz',
 'trade',
 'method',
 'method',
 'see',
 'prev',
 'posts']
In [25]:
# Joined the words into a string
#stock_df['Processed Text 2'] = stock_df['Processed Text 2'].apply(lambda x: " ".join(x))
In [26]:
stock_df
Out[26]:
Text Sentiment Text Without Punctuation Text Without Punc & Stopwords
0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... 1 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... [kickers, watchlist, xide, tit, soq, pnk, cpw,...
1 user: AAP MOVIE. 55% return for the FEA/GEED i... 1 user AAP MOVIE 55 return for the FEAGEED indic... [movie, return, feageed, indicator, trades, aw...
2 user I'd be afraid to short AMZN - they are lo... 1 user Id be afraid to short AMZN they are look... [afraid, short, amzn, looking, like, nearmonop...
3 MNTA Over 12.00 1 MNTA Over 1200 [mnta]
4 OI Over 21.37 1 OI Over 2137 []
... ... ... ... ...
5786 Industry body CII said #discoms are likely to ... 0 Industry body CII said discoms are likely to s... [industry, body, cii, said, discoms, likely, s...
5787 #Gold prices slip below Rs 46,000 as #investor... 0 Gold prices slip below Rs 46000 as investors b... [gold, prices, slip, investors, book, profits,...
5788 Workers at Bajaj Auto have agreed to a 10% wag... 1 Workers at Bajaj Auto have agreed to a 10 wage... [workers, bajaj, auto, agreed, wage, cut, peri...
5789 #Sharemarket LIVE: Sensex off day’s high, up 6... 1 Sharemarket LIVE Sensex off day’s high up 600 ... [sharemarket, live, sensex, high, points, nift...
5790 #Sensex, #Nifty climb off day's highs, still u... 1 Sensex Nifty climb off days highs still up 2 K... [sensex, nifty, climb, days, highs, still, key...

5791 rows × 4 columns

In [27]:
# Obtained additional stopwords from nltk
stop_words = stopwords.words('english')
stop_words.extend(['from', 'subject', 're', 'edu', 'use','will','aap','co','day','user','stock','today','week','year', 'https'])
In [28]:
# Removed stopwords and remove words with 2 or less characters
def preprocess(text):
    result = []
    for token in gensim.utils.simple_preprocess(text):
        if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) >= 2 and token not in stop_words:
            result.append(token)
            
    return result
In [29]:
# Applied pre-processing to the text column
stock_df['Text Without Punc & Stopwords'] = stock_df['Text Without Punctuation'].apply(preprocess)
In [30]:
stock_df['Text'][0]
Out[30]:
'Kickers on my watchlist XIDE TIT SOQ PNK CPW BPZ AJ  trade method 1 or method 2, see prev posts'
In [31]:
stock_df['Text Without Punc & Stopwords'][0]
Out[31]:
['kickers',
 'watchlist',
 'xide',
 'tit',
 'soq',
 'pnk',
 'cpw',
 'bpz',
 'aj',
 'trade',
 'method',
 'method',
 'prev',
 'posts']
In [32]:
stock_df
Out[32]:
Text Sentiment Text Without Punctuation Text Without Punc & Stopwords
0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... 1 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... [kickers, watchlist, xide, tit, soq, pnk, cpw,...
1 user: AAP MOVIE. 55% return for the FEA/GEED i... 1 user AAP MOVIE 55 return for the FEAGEED indic... [movie, return, feageed, indicator, trades, aw...
2 user I'd be afraid to short AMZN - they are lo... 1 user Id be afraid to short AMZN they are look... [id, afraid, short, amzn, looking, like, nearm...
3 MNTA Over 12.00 1 MNTA Over 1200 [mnta]
4 OI Over 21.37 1 OI Over 2137 [oi]
... ... ... ... ...
5786 Industry body CII said #discoms are likely to ... 0 Industry body CII said discoms are likely to s... [industry, body, cii, said, discoms, likely, s...
5787 #Gold prices slip below Rs 46,000 as #investor... 0 Gold prices slip below Rs 46000 as investors b... [gold, prices, slip, rs, investors, book, prof...
5788 Workers at Bajaj Auto have agreed to a 10% wag... 1 Workers at Bajaj Auto have agreed to a 10 wage... [workers, bajaj, auto, agreed, wage, cut, peri...
5789 #Sharemarket LIVE: Sensex off day’s high, up 6... 1 Sharemarket LIVE Sensex off day’s high up 600 ... [sharemarket, live, sensex, high, points, nift...
5790 #Sensex, #Nifty climb off day's highs, still u... 1 Sensex Nifty climb off days highs still up 2 K... [sensex, nifty, climb, days, highs, key, facto...

5791 rows × 4 columns

Plotted Wordcloud

In [33]:
# Joined the words into a string
stock_df['Text Without Punc & Stopwords Joined'] = stock_df['Text Without Punc & Stopwords'].apply(lambda x: " ".join(x))
In [34]:
# Plotted the word cloud for text with positive sentiment
plt.figure(figsize = (20, 20)) 
wc = WordCloud(max_words = 1000 , width = 1600 , height = 800).generate(" ".join(stock_df[stock_df['Sentiment'] == 1]['Text Without Punc & Stopwords Joined']))
plt.imshow(wc, interpolation = 'bilinear');

Visualized the Wordcloud for Tweets that have Negative Sentiment

In [35]:
# Plotted the word cloud for text that is negative
plt.figure(figsize = (20,20)) 
wc = WordCloud(max_words = 1000, width = 1600, height = 800 ).generate(" ".join(stock_df[stock_df['Sentiment'] == 0]['Text Without Punc & Stopwords Joined']))
plt.imshow(wc, interpolation = 'bilinear');

Visualized Cleaned Datasets

In [36]:
stock_df
Out[36]:
Text Sentiment Text Without Punctuation Text Without Punc & Stopwords Text Without Punc & Stopwords Joined
0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... 1 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... [kickers, watchlist, xide, tit, soq, pnk, cpw,... kickers watchlist xide tit soq pnk cpw bpz aj ...
1 user: AAP MOVIE. 55% return for the FEA/GEED i... 1 user AAP MOVIE 55 return for the FEAGEED indic... [movie, return, feageed, indicator, trades, aw... movie return feageed indicator trades awesome
2 user I'd be afraid to short AMZN - they are lo... 1 user Id be afraid to short AMZN they are look... [id, afraid, short, amzn, looking, like, nearm... id afraid short amzn looking like nearmonopoly...
3 MNTA Over 12.00 1 MNTA Over 1200 [mnta] mnta
4 OI Over 21.37 1 OI Over 2137 [oi] oi
... ... ... ... ... ...
5786 Industry body CII said #discoms are likely to ... 0 Industry body CII said discoms are likely to s... [industry, body, cii, said, discoms, likely, s... industry body cii said discoms likely suffer n...
5787 #Gold prices slip below Rs 46,000 as #investor... 0 Gold prices slip below Rs 46000 as investors b... [gold, prices, slip, rs, investors, book, prof... gold prices slip rs investors book profits ami...
5788 Workers at Bajaj Auto have agreed to a 10% wag... 1 Workers at Bajaj Auto have agreed to a 10 wage... [workers, bajaj, auto, agreed, wage, cut, peri... workers bajaj auto agreed wage cut period apri...
5789 #Sharemarket LIVE: Sensex off day’s high, up 6... 1 Sharemarket LIVE Sensex off day’s high up 600 ... [sharemarket, live, sensex, high, points, nift... sharemarket live sensex high points nifty test...
5790 #Sensex, #Nifty climb off day's highs, still u... 1 Sensex Nifty climb off days highs still up 2 K... [sensex, nifty, climb, days, highs, key, facto... sensex nifty climb days highs key factors driv...

5791 rows × 5 columns

In [37]:
nltk.download('punkt')
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\kusha\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Out[37]:
True
In [38]:
# word_tokenize is used to break up a string into words
print(stock_df['Text Without Punc & Stopwords Joined'][0])
print(nltk.word_tokenize(stock_df['Text Without Punc & Stopwords Joined'][0]))
kickers watchlist xide tit soq pnk cpw bpz aj trade method method prev posts
['kickers', 'watchlist', 'xide', 'tit', 'soq', 'pnk', 'cpw', 'bpz', 'aj', 'trade', 'method', 'method', 'prev', 'posts']
In [39]:
# Obtained the maximum length of data in the document
# This will be later used when word embeddings are generated
maxlen = -1
for doc in stock_df['Text Without Punc & Stopwords Joined']:
    tokens = nltk.word_tokenize(doc)
    if(maxlen < len(tokens)):
        maxlen = len(tokens)
print("The maximum number of words in any document is:", maxlen)
The maximum number of words in any document is: 22
In [40]:
tweets_length = [ len(nltk.word_tokenize(x)) for x in stock_df['Text Without Punc & Stopwords Joined'] ]
tweets_length
Out[40]:
[14,
 6,
 8,
 1,
 1,
 1,
 7,
 13,
 8,
 4,
 9,
 14,
 8,
 8,
 10,
 6,
 12,
 8,
 12,
 4,
 6,
 4,
 1,
 5,
 3,
 10,
 3,
 4,
 9,
 6,
 6,
 8,
 7,
 3,
 10,
 10,
 4,
 8,
 11,
 9,
 9,
 3,
 9,
 6,
 5,
 10,
 8,
 4,
 8,
 9,
 11,
 9,
 7,
 2,
 16,
 11,
 9,
 8,
 2,
 15,
 7,
 10,
 4,
 17,
 7,
 7,
 6,
 5,
 6,
 7,
 9,
 4,
 8,
 13,
 19,
 7,
 8,
 7,
 3,
 9,
 5,
 4,
 9,
 9,
 17,
 4,
 9,
 6,
 6,
 2,
 1,
 7,
 10,
 3,
 7,
 7,
 7,
 8,
 1,
 4,
 8,
 4,
 14,
 9,
 10,
 9,
 18,
 6,
 7,
 12,
 10,
 7,
 3,
 4,
 10,
 10,
 7,
 7,
 8,
 5,
 5,
 7,
 10,
 13,
 2,
 4,
 8,
 15,
 15,
 10,
 3,
 1,
 1,
 3,
 7,
 12,
 11,
 10,
 9,
 12,
 10,
 11,
 14,
 6,
 7,
 9,
 11,
 9,
 6,
 12,
 10,
 4,
 8,
 8,
 12,
 11,
 7,
 12,
 4,
 5,
 3,
 7,
 3,
 5,
 9,
 4,
 6,
 10,
 5,
 15,
 7,
 5,
 5,
 9,
 9,
 8,
 8,
 2,
 9,
 9,
 8,
 11,
 9,
 8,
 6,
 3,
 6,
 5,
 8,
 9,
 4,
 6,
 7,
 4,
 4,
 7,
 10,
 9,
 8,
 10,
 9,
 10,
 9,
 12,
 9,
 6,
 5,
 3,
 12,
 13,
 7,
 10,
 9,
 14,
 10,
 6,
 6,
 7,
 10,
 10,
 3,
 3,
 2,
 10,
 3,
 9,
 8,
 15,
 10,
 9,
 14,
 6,
 8,
 2,
 3,
 12,
 15,
 6,
 9,
 8,
 15,
 5,
 2,
 2,
 7,
 6,
 14,
 4,
 5,
 7,
 9,
 1,
 1,
 11,
 8,
 13,
 5,
 3,
 8,
 4,
 4,
 8,
 8,
 9,
 12,
 5,
 9,
 4,
 4,
 6,
 1,
 5,
 4,
 9,
 2,
 6,
 12,
 4,
 10,
 8,
 8,
 6,
 10,
 2,
 8,
 9,
 2,
 9,
 10,
 14,
 9,
 11,
 4,
 2,
 6,
 4,
 7,
 15,
 5,
 6,
 2,
 5,
 12,
 11,
 9,
 4,
 6,
 8,
 11,
 13,
 6,
 6,
 4,
 8,
 5,
 11,
 6,
 15,
 11,
 9,
 3,
 3,
 5,
 6,
 2,
 5,
 4,
 5,
 13,
 12,
 5,
 10,
 10,
 5,
 5,
 4,
 9,
 8,
 12,
 6,
 9,
 10,
 4,
 9,
 2,
 6,
 3,
 3,
 9,
 9,
 6,
 4,
 11,
 7,
 2,
 10,
 2,
 1,
 12,
 12,
 6,
 6,
 2,
 12,
 14,
 5,
 13,
 9,
 4,
 13,
 11,
 4,
 6,
 10,
 7,
 6,
 6,
 12,
 4,
 11,
 5,
 2,
 5,
 14,
 15,
 14,
 11,
 15,
 5,
 11,
 5,
 8,
 2,
 9,
 4,
 7,
 9,
 5,
 14,
 7,
 10,
 13,
 10,
 11,
 8,
 9,
 3,
 10,
 10,
 9,
 12,
 1,
 5,
 6,
 11,
 9,
 11,
 4,
 2,
 4,
 9,
 6,
 5,
 8,
 2,
 7,
 8,
 5,
 11,
 2,
 14,
 8,
 7,
 10,
 5,
 10,
 6,
 7,
 8,
 10,
 5,
 8,
 9,
 8,
 9,
 9,
 6,
 8,
 4,
 15,
 9,
 9,
 2,
 2,
 4,
 9,
 5,
 10,
 8,
 2,
 9,
 9,
 4,
 9,
 4,
 6,
 9,
 11,
 1,
 0,
 1,
 2,
 13,
 3,
 9,
 9,
 5,
 6,
 4,
 7,
 7,
 7,
 6,
 3,
 5,
 11,
 6,
 3,
 8,
 9,
 21,
 2,
 4,
 14,
 6,
 5,
 5,
 5,
 12,
 3,
 8,
 6,
 8,
 3,
 7,
 11,
 4,
 4,
 9,
 3,
 4,
 3,
 9,
 3,
 3,
 4,
 9,
 3,
 5,
 3,
 5,
 13,
 7,
 4,
 3,
 6,
 6,
 11,
 5,
 8,
 3,
 4,
 6,
 7,
 3,
 11,
 5,
 2,
 8,
 3,
 12,
 13,
 5,
 8,
 5,
 10,
 6,
 8,
 8,
 5,
 13,
 11,
 11,
 13,
 2,
 2,
 11,
 8,
 4,
 1,
 3,
 5,
 13,
 6,
 8,
 12,
 6,
 9,
 9,
 5,
 10,
 4,
 7,
 8,
 8,
 9,
 5,
 9,
 12,
 3,
 9,
 9,
 10,
 4,
 10,
 9,
 15,
 6,
 7,
 2,
 6,
 7,
 8,
 3,
 11,
 10,
 8,
 3,
 9,
 5,
 6,
 5,
 10,
 10,
 4,
 3,
 10,
 6,
 11,
 2,
 5,
 5,
 5,
 6,
 13,
 11,
 7,
 7,
 8,
 5,
 14,
 2,
 11,
 12,
 5,
 4,
 6,
 8,
 3,
 3,
 12,
 6,
 4,
 6,
 9,
 1,
 2,
 4,
 11,
 21,
 15,
 4,
 12,
 6,
 5,
 7,
 14,
 9,
 8,
 2,
 4,
 8,
 9,
 5,
 10,
 11,
 5,
 6,
 2,
 9,
 8,
 12,
 2,
 2,
 9,
 5,
 7,
 2,
 3,
 6,
 6,
 10,
 3,
 5,
 8,
 5,
 9,
 3,
 12,
 5,
 5,
 5,
 8,
 12,
 2,
 7,
 11,
 7,
 5,
 8,
 9,
 3,
 7,
 8,
 11,
 10,
 9,
 5,
 11,
 6,
 4,
 11,
 10,
 11,
 12,
 1,
 9,
 7,
 11,
 3,
 7,
 8,
 1,
 3,
 7,
 7,
 5,
 5,
 6,
 10,
 3,
 5,
 17,
 7,
 11,
 8,
 2,
 2,
 10,
 3,
 5,
 5,
 16,
 9,
 8,
 13,
 11,
 13,
 6,
 2,
 10,
 7,
 4,
 11,
 1,
 11,
 5,
 5,
 5,
 5,
 9,
 3,
 4,
 9,
 3,
 10,
 3,
 10,
 7,
 8,
 3,
 1,
 9,
 6,
 5,
 11,
 13,
 5,
 5,
 6,
 12,
 10,
 8,
 8,
 3,
 10,
 10,
 10,
 15,
 10,
 12,
 7,
 6,
 10,
 16,
 9,
 9,
 9,
 2,
 7,
 6,
 6,
 10,
 10,
 9,
 5,
 5,
 3,
 9,
 8,
 10,
 1,
 6,
 5,
 3,
 3,
 7,
 9,
 9,
 7,
 6,
 7,
 8,
 9,
 5,
 7,
 6,
 6,
 8,
 4,
 7,
 5,
 7,
 7,
 4,
 7,
 13,
 4,
 1,
 2,
 4,
 2,
 12,
 8,
 4,
 7,
 3,
 8,
 11,
 7,
 3,
 9,
 4,
 1,
 3,
 8,
 7,
 3,
 6,
 4,
 13,
 5,
 3,
 6,
 10,
 7,
 12,
 4,
 11,
 2,
 6,
 4,
 7,
 5,
 6,
 10,
 14,
 8,
 8,
 3,
 2,
 8,
 0,
 8,
 6,
 11,
 5,
 13,
 2,
 6,
 8,
 7,
 9,
 3,
 4,
 3,
 3,
 3,
 4,
 7,
 4,
 5,
 4,
 4,
 8,
 4,
 3,
 5,
 6,
 6,
 13,
 8,
 1,
 9,
 10,
 8,
 6,
 6,
 6,
 5,
 5,
 5,
 9,
 9,
 6,
 7,
 6,
 4,
 9,
 2,
 6,
 5,
 7,
 10,
 3,
 4,
 6,
 9,
 6,
 12,
 13,
 8,
 7,
 9,
 8,
 8,
 4,
 9,
 8,
 5,
 9,
 7,
 8,
 6,
 8,
 4,
 9,
 7,
 12,
 9,
 14,
 6,
 4,
 4,
 0,
 14,
 16,
 10,
 7,
 7,
 2,
 6,
 7,
 8,
 7,
 5,
 9,
 8,
 6,
 3,
 3,
 10,
 10,
 4,
 3,
 7,
 14,
 6,
 13,
 7,
 3,
 6,
 7,
 13,
 9,
 16,
 11,
 8,
 4,
 10,
 2,
 13,
 7,
 4,
 3,
 4,
 6,
 11,
 8,
 14,
 ...]
In [41]:
# Plotted the distribution for the number of words in a text
fig = px.histogram(x = tweets_length, nbins = 50)
fig.show()
In [42]:
# Plotted the word count
sns.countplot(stock_df['Sentiment'])
Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x26904c77bc8>

Prepared the Data by Tokenizing and Padding

Tokenizer

  • Tokenizer allows us to vectorize text corpus.
  • Tokenization works by turning each text into a sequence of integers.
In [43]:
stock_df
Out[43]:
Text Sentiment Text Without Punctuation Text Without Punc & Stopwords Text Without Punc & Stopwords Joined
0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... 1 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... [kickers, watchlist, xide, tit, soq, pnk, cpw,... kickers watchlist xide tit soq pnk cpw bpz aj ...
1 user: AAP MOVIE. 55% return for the FEA/GEED i... 1 user AAP MOVIE 55 return for the FEAGEED indic... [movie, return, feageed, indicator, trades, aw... movie return feageed indicator trades awesome
2 user I'd be afraid to short AMZN - they are lo... 1 user Id be afraid to short AMZN they are look... [id, afraid, short, amzn, looking, like, nearm... id afraid short amzn looking like nearmonopoly...
3 MNTA Over 12.00 1 MNTA Over 1200 [mnta] mnta
4 OI Over 21.37 1 OI Over 2137 [oi] oi
... ... ... ... ... ...
5786 Industry body CII said #discoms are likely to ... 0 Industry body CII said discoms are likely to s... [industry, body, cii, said, discoms, likely, s... industry body cii said discoms likely suffer n...
5787 #Gold prices slip below Rs 46,000 as #investor... 0 Gold prices slip below Rs 46000 as investors b... [gold, prices, slip, rs, investors, book, prof... gold prices slip rs investors book profits ami...
5788 Workers at Bajaj Auto have agreed to a 10% wag... 1 Workers at Bajaj Auto have agreed to a 10 wage... [workers, bajaj, auto, agreed, wage, cut, peri... workers bajaj auto agreed wage cut period apri...
5789 #Sharemarket LIVE: Sensex off day’s high, up 6... 1 Sharemarket LIVE Sensex off day’s high up 600 ... [sharemarket, live, sensex, high, points, nift... sharemarket live sensex high points nifty test...
5790 #Sensex, #Nifty climb off day's highs, still u... 1 Sensex Nifty climb off days highs still up 2 K... [sensex, nifty, climb, days, highs, key, facto... sensex nifty climb days highs key factors driv...

5791 rows × 5 columns

In [44]:
# Obtained the total words present in the dataset
list_of_words = []
for i in stock_df['Text Without Punc & Stopwords']:
    for j in i:
        list_of_words.append(j)
In [45]:
list_of_words
Out[45]:
['kickers',
 'watchlist',
 'xide',
 'tit',
 'soq',
 'pnk',
 'cpw',
 'bpz',
 'aj',
 'trade',
 'method',
 'method',
 'prev',
 'posts',
 'movie',
 'return',
 'feageed',
 'indicator',
 'trades',
 'awesome',
 'id',
 'afraid',
 'short',
 'amzn',
 'looking',
 'like',
 'nearmonopoly',
 'ebooks',
 'mnta',
 'oi',
 'pgnx',
 'current',
 'downtrend',
 'break',
 'shortterm',
 'correction',
 'medterm',
 'downtrend',
 'mondays',
 'relative',
 'weakness',
 'nyx',
 'win',
 'tie',
 'tap',
 'ice',
 'int',
 'bmc',
 'aon',
 'chk',
 'biib',
 'goog',
 'ower',
 'trend',
 'line',
 'channel',
 'test',
 'volume',
 'support',
 'watch',
 'tomorrow',
 'ong',
 'entry',
 'im',
 'assuming',
 'fcx',
 'opens',
 'tomorrow',
 'trigger',
 'buy',
 'like',
 'setup',
 'worries',
 'expects',
 'market',
 'rally',
 'nowusually',
 'exact',
 'opposite',
 'happens',
 'time',
 'shall',
 'soon',
 'bac',
 'spx',
 'jpm',
 'gamcos',
 'arry',
 'haverty',
 'apple',
 'extremely',
 'cheap',
 'great',
 'video',
 'maykiljil',
 'posted',
 'agree',
 'msft',
 'going',
 'higher',
 'possibly',
 'north',
 'momentum',
 'coming',
 'etfc',
 'broke',
 'resistance',
 'solid',
 'volume',
 'friday',
 'ong',
 'setup',
 'ha',
 'hitting',
 'means',
 'resume',
 'targeting',
 'level',
 'gameplan',
 'shot',
 'liked',
 'trend',
 'break',
 'ch',
 'break',
 'oc',
 'weekly',
 'trend',
 'break',
 'july',
 'fcx',
 'gapping',
 'ideal',
 'entry',
 'looking',
 'pull',
 'open',
 'entry',
 'great',
 'list',
 'particularly',
 'like',
 'fisv',
 'syk',
 'buy',
 'hold',
 'types',
 'check',
 'free',
 'list',
 'athx',
 'upper',
 'trend',
 'line',
 'ng',
 'nice',
 'pnf',
 'breakout',
 'need',
 'follow',
 'wont',
 'believe',
 'uptrend',
 'crosses',
 'swing',
 'swy',
 'float',
 'short',
 'breaking',
 'ouch',
 'biof',
 'wants',
 'comin',
 'vs',
 'inverted',
 'head',
 'shoulder',
 'play',
 'wasnt',
 'able',
 'catch',
 'entry',
 'eyes',
 'red',
 'ready',
 'break',
 'ei',
 'close',
 'breaking',
 'trigger',
 'bac',
 'quick',
 'trade',
 'latebut',
 'investing',
 'good',
 'entry',
 'point',
 'imho',
 'chdn',
 'ong',
 'trailing',
 'stop',
 'prior',
 'stops',
 'vome',
 'impressive',
 'rate',
 'probably',
 'shares',
 'traded',
 'adding',
 'vxy',
 'long',
 'trade',
 'got',
 'wpi',
 'near',
 'low',
 'repeat',
 'global',
 'economy',
 'going',
 'better',
 'instead',
 'bac',
 'goog',
 'ong',
 'close',
 'nkd',
 'looking',
 'like',
 'good',
 'short',
 'failed',
 'break',
 'price',
 'level',
 'resistance',
 'gs',
 'like',
 'price',
 'action',
 'far',
 'holding',
 'deciding',
 'ong',
 'calls',
 'stocks',
 'sbx',
 'buy',
 'clears',
 'resistance',
 'new',
 'target',
 'notice',
 'shakeout',
 'reading',
 'buy',
 'dot',
 'monday',
 'early',
 'short',
 'market',
 'needs',
 'days',
 'settle',
 'patience',
 'coh',
 'bwd',
 'dt',
 'pay',
 'obbers',
 'hit',
 'apple',
 'store',
 'paris',
 'prefer',
 'merchendise',
 'cash',
 'bullish',
 'axa',
 'tip',
 'buy',
 'sign',
 'newsletter',
 'im',
 'wrong',
 'eat',
 'crow',
 'kirby',
 'daily',
 'swinging',
 'min',
 'short',
 'longs',
 'fully',
 'added',
 'look',
 'pop',
 'marketshakeout',
 'reset',
 'ng',
 'stellar',
 'long',
 'swing',
 'position',
 'believe',
 'atice',
 'tade',
 'setp',
 'ko',
 'cnx',
 'idcc',
 'sma',
 'acting',
 'good',
 'support',
 'shortterm',
 'momentum',
 'indicators',
 'switched',
 'upside',
 'new',
 'post',
 'bac',
 'month',
 'highs',
 'showing',
 'signs',
 'stress',
 'daily',
 'bull',
 'flags',
 'otherbullish',
 'ng',
 'traditional',
 'point',
 'figure',
 'bullish',
 'signals',
 'multiyear',
 'high',
 'jpm',
 'closed',
 'sma',
 'mondayu',
 'know',
 'ruleslarge',
 'cup',
 'handlestill',
 'bullish',
 'eog',
 'ascending',
 'triangle',
 'original',
 'target',
 'extension',
 'eps',
 'feb',
 'th',
 'position',
 'swing',
 'aapl',
 'daily',
 'broke',
 'downtrendfrom',
 'closing',
 'prices',
 'pattern',
 'play',
 'target',
 'phm',
 'pultegroup',
 'option',
 'bear',
 'bets',
 'million',
 'april',
 'bac',
 'trade',
 'short',
 'setups',
 'displaying',
 'relative',
 'weakness',
 'ao',
 'coh',
 'pay',
 'bwd',
 'dt',
 'shd',
 'wtw',
 'prepare',
 'short',
 'stall',
 'esf',
 'amzn',
 'monetize',
 'electronics',
 'makes',
 'traffic',
 'marketplace',
 'goog',
 'wake',
 'didnt',
 'buy',
 'ebay',
 'cvi',
 'starting',
 'clean',
 'book',
 'making',
 'buy',
 'yr',
 'entry',
 'stop',
 'sam',
 'cheers',
 'temporary',
 'solution',
 'economic',
 'woes',
 'entry',
 'stop',
 'sk',
 'gave',
 'ovti',
 'ttm',
 'ending',
 'oct',
 'negative',
 'million',
 'operational',
 'cash',
 'flow',
 'decline',
 'vs',
 'yoy',
 'ttm',
 'ending',
 'oct',
 'goog',
 'respond',
 'positively',
 'good',
 'jobless',
 'claims',
 'numbers',
 'analyst',
 'price',
 'targets',
 'earnings',
 'targets',
 'coming',
 'downexacty',
 'want',
 'prior',
 'reporting',
 'nice',
 'color',
 'intc',
 'bull',
 'heads',
 'aa',
 'earnings',
 'ess',
 'away',
 'tho',
 'season',
 'doesnt',
 'eay',
 'begin',
 'cranking',
 'til',
 'gs',
 'jpm',
 'ebay',
 'et',
 'al',
 'trade',
 'idea',
 'buy',
 'wo',
 'market',
 'target',
 'cut',
 'trade',
 'idea',
 'buy',
 'dc',
 'market',
 'target',
 'cut',
 'goog',
 'looks',
 'set',
 'new',
 'ath',
 'unknown',
 'goog',
 'trade',
 'long',
 'verse',
 'sma',
 'fnf',
 'ong',
 'trailing',
 'stop',
 'prior',
 'stops',
 'fod',
 'solid',
 'tom',
 'predict',
 'nice',
 'green',
 'candle',
 'fitness',
 'health',
 'appsthe',
 'winners',
 'gynormous',
 'speculation',
 'run',
 'high',
 'nke',
 'time',
 'invest',
 'msft',
 'compq',
 'new',
 'post',
 'shorts',
 'dimes',
 'ongs',
 'dollars',
 'spx',
 'qqq',
 'nbelieveable',
 'payment',
 'trend',
 'vcs',
 'chasing',
 'easily',
 'played',
 'public',
 'markets',
 'ebay',
 'mastercard',
 'time',
 'highs',
 'watch',
 'list',
 'stocks',
 'triggered',
 'oex',
 'agnx',
 'tv',
 'vbd',
 'xide',
 'ppc',
 'exp',
 'onty',
 'bmn',
 'fs',
 'wk',
 'chmt',
 'fbc',
 'isi',
 'aig',
 'nice',
 'bull',
 'flag',
 'breakout',
 'oc',
 'technology',
 'software',
 'setup',
 'alerts',
 'went',
 'bonkers',
 'todayone',
 'got',
 'breakout',
 'pfe',
 'host',
 'mostcongrats',
 'cloud',
 'ax',
 'alltime',
 'highs',
 'wow',
 'steel',
 'breakout',
 'mt',
 'dndn',
 'broken',
 'clear',
 'resistanc',
 'level',
 'heavy',
 'volume',
 'headed',
 'higher',
 'expe',
 'looks',
 'good',
 'higher',
 'prices',
 'apc',
 'weekly',
 'setting',
 'long',
 'going',
 'morning',
 'long',
 'options',
 'goog',
 'calls',
 'jan',
 'th',
 'expiration',
 'esf',
 'path',
 'higher',
 'unthreatened',
 'support',
 'fails',
 'resistance',
 'spx',
 'stocks',
 'monitor',
 'lineprice',
 'dynamic',
 'stocks',
 'closed',
 'dec',
 'line',
 'new',
 'highs',
 'price',
 'mho',
 'tsm',
 'ipgp',
 'eqix',
 'bc',
 'bdc',
 'ash',
 'ddd',
 'idcc',
 'continue',
 'higher',
 'hoped',
 'pick',
 'ost',
 'looks',
 'like',
 'im',
 'late',
 'nice',
 'morning',
 'gps',
 'wow',
 'wa',
 'fast',
 'fast',
 'fade',
 'amzn',
 'going',
 'near',
 'market',
 'behaves',
 'weeks',
 'huston',
 'geen',
 'long',
 'hn',
 'breakoutbut',
 'rd',
 'look',
 'pullback',
 'enter',
 'long',
 'ng',
 'ascend',
 'triangle',
 'like',
 'fs',
 'breakouts',
 'exten',
 'targets',
 'fibs',
 'subjective',
 'kcg',
 'things',
 'hmmmm',
 'vng',
 'huge',
 'news',
 'patent',
 'attorney',
 'dan',
 'avicher',
 'ng',
 'nhod',
 'check',
 'weekly',
 'target',
 'previous',
 'lows',
 'ng',
 'nhod',
 'check',
 'weekly',
 'target',
 'previous',
 'lows',
 'aig',
 'american',
 'international',
 'group',
 'option',
 'traders',
 'bet',
 'friday',
 'balance',
 'vng',
 'buys',
 'vs',
 'sells',
 'bac',
 'consolidation',
 'level',
 'close',
 'market',
 'stable',
 'pm',
 'imho',
 'vng',
 'parabolic',
 'moves',
 'comin',
 'new',
 'sony',
 'preowned',
 'block',
 'tech',
 'patent',
 'unearthed',
 'sne',
 'gme',
 'profits',
 'pre',
 'owned',
 'turn',
 'lights',
 'liked',
 'vs',
 'love',
 'wynn',
 'entry',
 'new',
 'target',
 'yesterday',
 'nice',
 'spx',
 'wave',
 'iv',
 'points',
 'lower',
 'key',
 'esf',
 'hod',
 'neckline',
 'bearish',
 'ddd',
 'consolidating',
 'strong',
 'previous',
 'days',
 'buy',
 'point',
 'volume',
 'holding',
 'vng',
 'mil',
 'shares',
 'shorts',
 'plus',
 'long',
 'buying',
 'massive',
 'vng',
 'short',
 'squeeze',
 'thought',
 'story',
 'dead',
 'avicher',
 'kills',
 'short',
 'thesis',
 'wow',
 'good',
 'bks',
 'sales',
 'nook',
 'holidays',
 'ac',
 'watch',
 'list',
 'volume',
 'pick',
 'average',
 'bounce',
 'new',
 'wightwatchers',
 'ads',
 'fun',
 'light',
 'regular',
 'people',
 'interesting',
 'dont',
 'trust',
 'wtw',
 'position',
 'amzn',
 'holding',
 'buy',
 'point',
 'clears',
 'upper',
 'trend',
 'line',
 'heavy',
 'volume',
 'exactly',
 'want',
 'follow',
 'breaking',
 'new',
 'highs',
 'volume',
 'zcs',
 'theres',
 'buying',
 'spy',
 'qe',
 'bring',
 'geithner',
 'extended',
 'market',
 'unexpected',
 'news',
 'fed',
 'minutes',
 'esd',
 'panicky',
 'esf',
 'nqf',
 'spy',
 'ncle',
 'ben',
 'needed',
 'spice',
 'things',
 'little',
 'people',
 'bored',
 'wake',
 'close',
 'lows',
 'bears',
 'control',
 'descending',
 'trendline',
 'watch',
 'fan',
 'seeking',
 'im',
 'adding',
 'synthetic',
 'shorts',
 'add',
 'cwst',
 'low',
 'near',
 'high',
 'watchn',
 'bout',
 'minimal',
 'risk',
 'gme',
 'gamestop',
 'short',
 'targets',
 'spy',
 'time',
 'cloud',
 'stocks',
 'vmw',
 'cm',
 'bac',
 'overniight',
 'mmy',
 'pos',
 'bin',
 'premarket',
 'tmorrow',
 'polk',
 'expects',
 'new',
 'vehicle',
 'egistrations',
 'million',
 'polk',
 'gm',
 'think',
 'low',
 'bullish',
 'market',
 'talked',
 'trends',
 'scty',
 'ost',
 'angi',
 'fio',
 'panw',
 'jcp',
 'yep',
 'mnst',
 'gpn',
 'bac',
 'happens',
 'kcg',
 'bought',
 'jan',
 'calls',
 'man',
 'didnt',
 'know',
 'skippy',
 'thiswould',
 'thrown',
 'jar',
 'hormel',
 'big',
 'bonds',
 'months',
 'high',
 'rate',
 'going',
 'lower',
 'znf',
 'truly',
 'ugly',
 'good',
 'nfp',
 'number',
 'spy',
 'tt',
 'market',
 'wrap',
 'video',
 'additions',
 'watch',
 'ist',
 'including',
 'bcei',
 'bji',
 'cee',
 'es',
 'sons',
 'sqnm',
 'nx',
 'yep',
 'maybe',
 'vrng',
 'asked',
 'judge',
 'royalty',
 'risky',
 'goog',
 'deal',
 'want',
 'settle',
 'bji',
 'good',
 'volume',
 'cee',
 'sqnm',
 'nx',
 'flag',
 'break',
 'systems',
 'brings',
 'nextgen',
 'consumer',
 ...]
In [46]:
# Obtained the total number of unique words
total_words = len(list(set(list_of_words)))
total_words
Out[46]:
9487
In [47]:
# Splitted the data into test and train 
X = stock_df['Text Without Punc & Stopwords']
y = stock_df['Sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1)
In [48]:
X_train.shape
Out[48]:
(5211,)
In [49]:
X_test.shape
Out[49]:
(580,)
In [50]:
X_train
Out[50]:
2018    [prudential, liquidates, cm, stake, st, ones, ...
3584                              [strong, buying, close]
3446                   [isis, open, easiest, cent, trade]
790         [chk, min, keltbbands, fired, earlier, lunch]
900     [demark, coming, prop, yesterday, classic, dea...
                              ...                        
4505    [fcx, ooks, like, wants, test, recent, lows, s...
1204    [hedge, fund, hotel, margin, switchboard, blow...
2876                     [dvax, rocket, time, hold, long]
4681                        [complaints, point, bid, nfx]
1680    [calling, csn, ed, monthy, triangle, open, ong...
Name: Text Without Punc & Stopwords, Length: 5211, dtype: object
In [51]:
# Created a tokenizer to tokenize the words and create sequences of tokenized words
tokenizer = Tokenizer(num_words = total_words)
tokenizer.fit_on_texts(X_train)

# Training data
train_sequences = tokenizer.texts_to_sequences(X_train)

# Testing data
test_sequences = tokenizer.texts_to_sequences(X_test)
In [52]:
train_sequences
Out[52]:
[[3720, 3721, 295, 672, 135, 2529, 1909, 3722],
 [53, 57, 21],
 [1545, 89, 1910, 1546, 23],
 [296, 117, 3723, 1304, 522, 1911],
 [2530, 73, 3724, 79, 1547, 486, 97, 90, 260, 155, 35, 570, 423],
 [845, 118, 4, 210, 136],
 [122, 748, 357, 29, 82],
 [961, 227, 72, 64, 119, 99],
 [97, 1, 358],
 [3725, 75, 487, 2531, 161],
 [749, 1912, 1, 41, 27, 3, 3726, 3727],
 [1305, 750, 100, 43, 672, 846, 1306, 3728, 3729],
 [9, 523, 120, 58, 8, 359, 1913],
 [175, 3730, 1548, 1549, 1914, 962, 3731, 963],
 [27, 5, 60],
 [3732, 15, 3733, 3734, 3735, 3736],
 [181, 1110, 1, 2532],
 [2533, 1915, 24, 524, 2534],
 [398, 11, 751, 62, 3737],
 [59, 847, 673, 1916, 176, 848, 228, 571, 674, 614],
 [86, 144, 56],
 [1307, 16, 217, 10, 240, 126],
 [615, 10],
 [399, 331, 2535, 3738, 105],
 [241, 572, 424, 675, 137, 572, 101, 105, 141],
 [8, 63, 20, 400, 7, 360, 1917, 1308, 1111, 1918, 3739, 3740],
 [297, 6, 44, 2536, 1309, 13, 375, 33, 3741, 218, 3742, 317, 58, 573],
 [2537, 162, 849, 69],
 [41, 156, 376, 13, 106, 66, 2538, 2],
 [616, 219, 3743, 3744, 75],
 [2539, 1112, 89, 7, 964, 1919, 2540],
 [93, 2541, 452, 43, 220, 14, 3745, 17, 36, 22, 3746, 3747],
 [1310, 752, 14, 298],
 [1308, 127, 3748, 1308],
 [1311, 676, 2542, 574, 13, 3749, 122, 135, 18, 105, 107, 72],
 [17, 22, 150, 753, 754, 1312, 488, 3750, 2543],
 [247, 39, 375, 170, 677, 3751, 1, 94],
 [248, 128, 377, 46, 67, 33, 1920, 3752],
 [575, 2544, 1550, 102, 332, 50, 249, 34, 3753, 2545],
 [203, 1921, 1551, 678, 163, 82, 8],
 [45, 2546, 2547, 1922, 576, 250, 1313, 576, 333, 2548, 3754],
 [679, 12, 164, 221, 299, 79, 965],
 [17, 425, 36, 22, 680, 43, 211, 111, 966, 525, 20, 526, 2549],
 [129, 577, 1],
 [184, 3755, 4, 185, 46, 578],
 [1111, 617, 24],
 [967, 204, 2550, 361, 15, 1, 24],
 [1552, 41, 3756, 3757, 1314, 3758, 453, 3759, 1113, 618, 1315, 618],
 [1553, 2551, 3760, 1114, 3761, 261, 2552],
 [37, 25, 19, 2553, 16, 46, 318, 16, 1],
 [229, 299],
 [1316, 489, 53, 527, 142, 3762, 528, 2, 1923, 1924],
 [112, 28, 1925, 75, 5, 76, 157, 1926],
 [968, 850, 3763, 6, 1115, 454, 529],
 [619,
  3764,
  130,
  2554,
  334,
  1927,
  851,
  620,
  3765,
  1317,
  3766,
  74,
  31,
  3767,
  58],
 [1114, 120, 251, 16],
 [16, 2555],
 [8, 1318, 678, 530, 3768],
 [3769, 852, 853, 621, 672, 1319, 28, 8],
 [1116, 3770, 1320, 105, 3771, 755, 2556, 681, 969],
 [970, 262, 261, 1, 1928, 682],
 [2557, 2558, 80, 165, 854, 529, 51, 14, 1117, 34, 3772],
 [971, 155, 756, 855, 3773, 757, 1118],
 [3774, 3775, 3776, 2559, 2560, 1929, 182, 197, 3777, 2561, 3778],
 [1930, 130, 3779, 531, 3780, 1931, 3781],
 [3782, 758, 1932, 2, 401, 107, 683, 756, 2, 1, 300],
 [278, 402, 145, 1933, 2562, 263, 3783, 3784, 1933, 1934, 856, 1554],
 [1919,
  622,
  20,
  3785,
  2563,
  1309,
  218,
  53,
  67,
  1555,
  403,
  1119,
  70,
  161,
  623,
  532,
  6],
 [30, 579, 3, 2564, 79, 404, 279, 175],
 [3786, 222, 58],
 [335, 177, 138, 15, 99, 490],
 [1556, 3787, 3788, 3789, 3790, 43, 2565],
 [3791, 89, 7, 46, 7, 264, 336],
 [17, 22, 89, 13, 51, 111, 100, 43, 1120, 3792, 3793],
 [17, 455, 36, 35, 22, 1321, 43, 186, 51, 14, 1322, 17, 22, 319],
 [3794, 280, 1557, 57, 91, 1548, 1121, 334, 59, 265, 624, 1558],
 [17,
  857,
  36,
  22,
  131,
  95,
  1323,
  625,
  456,
  2566,
  2567,
  20,
  526,
  111,
  17,
  22,
  319],
 [1559, 457, 1122, 80, 3795, 3796],
 [378, 185, 23, 405, 376, 2568, 3797],
 [301, 1935, 759, 41, 302, 139, 2569, 7, 166],
 [1560, 56, 626, 1324, 1],
 [1123, 3798, 3799, 281, 855, 3800, 337, 972, 1936],
 [858, 3801],
 [3802, 973, 623, 39, 684, 58],
 [974, 185, 198, 242],
 [45, 3803, 966, 2570, 1937, 3804, 146, 278, 223, 3805],
 [859, 212, 46],
 [295, 1561, 860, 379],
 [171, 3806, 129, 2571, 528, 3807],
 [37, 25, 19, 2572, 92],
 [861, 12, 72, 3],
 [488, 117, 318, 230, 158, 1, 118, 35],
 [42, 862, 760, 685, 3808, 1938, 266, 863, 338, 491, 3809, 3810],
 [1, 3, 88, 1562, 458, 1325, 458, 761, 3, 42, 2573],
 [3811, 580, 176, 1939, 267, 3812, 3813, 3814, 1120, 3815],
 [3816, 2574, 17, 42, 36, 22, 1563, 2575, 1124, 83, 20, 526, 34, 3817],
 [31, 86, 492, 975, 3818, 3819],
 [1564, 459, 864, 1125, 1126, 3820],
 [1565, 3821, 83, 10, 187, 2576, 303],
 [1111, 52, 865, 53, 1326, 380, 527, 2, 150],
 [113],
 [627, 33, 3, 1940, 1941, 426, 3822, 1327, 3823, 1127, 2577, 976, 30, 92, 203],
 [1328, 1329, 1128, 2578, 282, 220, 172],
 [11, 362, 362],
 [3824, 3825, 2579, 40, 76, 10],
 [268, 173, 1330, 686, 269, 493, 494, 231, 106, 628, 1129, 977, 249],
 [3826, 1130, 3827, 1331, 533, 2580, 3828, 3829, 978],
 [263, 3830, 629, 75, 331, 524, 211, 1942, 339, 3831, 630, 1332],
 [575, 680, 581, 249, 51, 14, 159, 3832, 3833],
 [70, 23, 114, 79, 757, 2, 178, 2581],
 [280,
  144,
  30,
  866,
  1566,
  2582,
  1943,
  1567,
  2583,
  3834,
  2582,
  3835,
  866,
  1333,
  3836],
 [979, 453, 12, 127, 72],
 [295, 5, 185, 23, 687],
 [270, 2584, 3837, 151, 495, 406, 980],
 [1131, 3838, 7, 42, 16, 496, 10, 187],
 [147, 8, 63, 47, 2, 178],
 [3839, 981, 762, 867, 2585, 2586, 2587, 39, 1568, 320, 1944, 18],
 [497, 688, 19, 1569, 40, 407, 3840, 232, 460],
 [2, 868, 281, 3, 2588],
 [9, 23],
 [37, 25, 188, 19, 3841, 188],
 [869, 3842, 3843, 101, 1132, 304, 33, 3, 6],
 [982, 101, 4],
 [189, 115, 90, 271, 135, 123],
 [1570, 1945, 87, 123, 190, 983, 219, 143, 272, 54, 984, 631, 3844],
 [981, 4, 32],
 [155, 53, 2589, 985],
 [340, 331, 1334, 3845, 26, 1133, 152, 108, 40],
 [1946, 632, 423, 77, 155, 305, 77, 3846, 58, 3847, 218],
 [1947, 118, 10, 2590, 205, 26, 111],
 [870, 3848, 986, 580, 2591],
 [298, 83, 1948, 282, 333, 763, 689, 1949, 172, 3849, 3850],
 [851, 102, 211, 630, 4, 755],
 [249, 1950, 2592, 333, 764, 14, 2593, 100, 987, 1306, 2594],
 [206, 103, 3851, 1134, 3852, 3853, 2595, 1335, 14, 159, 3854],
 [92, 128, 341, 283, 3855, 13, 534, 633, 1, 219],
 [765, 498],
 [321, 1, 3856, 381, 54, 172, 634, 42],
 [2596, 16, 217, 10, 240, 126],
 [9, 73],
 [207, 3857, 535, 3858, 988, 1135, 3859],
 [1951, 8, 63, 2, 252, 1571, 47],
 [360, 1336, 2597],
 [9, 284, 124, 3860, 92],
 [45, 1136, 3861, 224, 3862, 871, 690, 872, 285, 2598, 272, 1137, 3863],
 [3864, 2599, 153, 191, 3865, 1952, 100, 582, 1337, 3866, 3867, 3868],
 [37, 25, 19, 3869],
 [461, 1953, 32, 3870],
 [32],
 [3871, 1320, 33, 322, 59, 265, 2593, 172, 1572, 3872, 3873],
 [459, 864],
 [32, 158, 4, 306, 6, 60, 233, 199, 691],
 [6, 536, 1954, 427, 2600, 3874, 3875],
 [225, 281, 1955, 192, 1956, 67],
 [296, 3876],
 [231, 106, 2601, 243, 96, 537, 205, 51, 1957, 635, 636],
 [382, 135, 123, 10],
 [766, 873, 3877, 359, 69],
 [342, 74, 16, 3878, 80],
 [340, 1338, 1958, 3879, 874],
 [767, 193, 19],
 [4, 692, 1959, 28, 3, 875],
 [323, 83, 876, 243, 343, 3880, 3881, 3882, 2602],
 [2603, 428, 1960],
 [321,
  583,
  3883,
  1,
  94,
  381,
  989,
  383,
  5,
  990,
  3884,
  165,
  167,
  991,
  3885,
  3886,
  50,
  408],
 [363, 768, 168, 877, 3887],
 [111, 2, 462, 105, 1573, 1961],
 [307, 36, 1, 135, 21, 878, 992, 24, 251, 137, 993],
 [59, 2604, 154, 379, 143, 154, 59],
 [381, 342, 1339, 1114, 994, 1138, 229, 20, 995, 996, 1574],
 [769, 3888, 693, 88, 172, 130, 1575, 1962],
 [637, 3889, 3890, 57],
 [2605, 2606, 132, 88, 339, 6, 1139, 208],
 [997, 146, 4],
 [364, 998, 1560, 1340, 2607, 3891, 3892, 3893, 1963, 1964, 86, 2608],
 [38, 493, 999, 1000, 260, 770, 4, 499],
 [130, 3894, 1576, 227, 15, 77, 71, 143, 2609],
 [37, 25, 19, 3895, 500],
 [253, 31, 28, 1341, 3896, 409, 4, 499],
 [8, 2610, 1, 84],
 [584, 24, 29, 47],
 [970, 116, 23, 308, 47, 336],
 [3, 4, 107, 1342, 148, 54, 879, 85],
 [771, 3897],
 [81, 188, 19, 1965],
 [342, 429, 365, 57, 383, 1921, 766, 20, 3898, 165, 1577, 16],
 [242, 3899, 1966, 463, 2611, 2612, 861],
 [3900, 1967, 3901, 3902, 3903, 333, 2613, 153, 102, 1578],
 [3904, 1140, 18],
 [981, 31, 2, 168, 1001, 21],
 [585, 464, 523, 3905, 1141, 32, 24, 85, 694, 1142],
 [213, 759, 306, 344],
 [41, 234, 3, 2614, 1343],
 [2615, 8, 63, 47, 501, 2, 178],
 [200, 309, 880, 235, 1968, 2616],
 [1002, 24, 145, 1118, 366, 881],
 [1143, 113, 367],
 [1144, 12, 90, 27, 1579],
 [2617,
  1145,
  586,
  2618,
  2619,
  1344,
  384,
  338,
  191,
  95,
  1969,
  1146,
  3906,
  3907],
 [3908],
 [12, 318, 638, 41, 4, 23, 368, 286],
 [584, 538, 6, 3909, 1580, 3910, 20, 617, 882, 430],
 [695, 883, 8, 1345, 116, 254, 880, 225],
 [6, 1970, 2620, 1147, 2591, 1148],
 [1581, 112, 219, 587, 76, 7, 38, 79, 12],
 [150, 30],
 [1003, 3911, 1, 249, 490],
 [1971, 1972, 99, 876, 1004, 164, 32, 755, 1005, 3912],
 [1346],
 [20,
  772,
  1316,
  502,
  1582,
  431,
  1583,
  773,
  2621,
  432,
  884,
  3913,
  310,
  539,
  2622,
  1347],
 [81, 188, 19, 369, 236],
 [3914, 774, 626, 775, 370],
 [2623, 88, 588, 433, 77],
 [885, 270, 38, 13],
 [234, 3, 194, 60, 198, 757],
 [280,
  886,
  3915,
  1125,
  84,
  280,
  118,
  75,
  1348,
  71,
  2624,
  187,
  1584,
  3916,
  624,
  1006,
  408],
 [14, 153, 2625, 2626, 1973, 281, 209, 1149, 160, 1349, 3917, 3918],
 [9, 3919, 78, 362, 21, 1929, 345, 150, 80, 11, 359, 21, 887],
 [17, 489, 36, 35, 22, 43, 776, 410, 308, 465, 17, 22, 319],
 [1974, 1569, 2],
 [433, 1585, 4, 222, 44],
 [45, 2627, 3920, 1975, 1132, 534, 199, 3921, 3922, 1976, 3923, 3924, 3925],
 [1977],
 [52, 222, 1007, 27, 227, 72],
 [2628, 2629, 250, 503, 3926, 2630, 1150, 1120, 3927],
 [385, 152, 523, 60, 344],
 [3928, 339, 696, 8],
 [2631, 152, 7, 3929, 5, 2],
 [174, 237, 28, 697, 49],
 [9, 120, 71, 1973],
 [2632, 1151, 79, 698, 35, 42, 197, 21, 777, 32, 1],
 [81, 25, 19, 3930, 92],
 [8, 63, 888, 89, 2, 178],
 [3931, 673, 26, 2633, 190, 225, 346, 889, 34, 3932, 3933],
 [16, 996],
 [52, 427, 344, 167, 540, 157, 84, 12, 362, 2634],
 [9, 54, 3934, 220],
 [347, 129, 1, 54, 36],
 [141, 1110, 1, 223, 181, 434],
 [307, 3935, 3936, 1008, 1009, 1350, 32, 1586],
 [1, 200],
 [58, 3937, 2635, 778, 3938, 751, 105, 244, 3939, 1587],
 [32, 251, 89, 1010],
 [1011, 35, 1978, 42],
 [150, 101, 63, 177, 890, 23, 1979, 1152, 1588],
 [238, 779, 52, 891, 4],
 [253, 1, 217, 10, 240, 126],
 [780, 6, 3940, 460, 3941, 15, 504, 1, 46, 85],
 [4, 57],
 [386, 3942, 20, 52, 616, 532, 387, 505, 109, 1012, 1153, 615, 1154, 781, 161],
 [1013, 59, 1589, 80, 324, 409, 80, 589],
 [194, 375],
 [435, 575, 1980, 131, 94, 2636, 2637, 3943],
 [3944, 1981, 2638, 619, 3945],
 [3946, 782],
 [3947, 1, 892, 82],
 [699, 893, 49, 131, 84, 193, 26, 120, 7, 40],
 [41, 1982, 1],
 [78, 541, 3948, 1351, 3949, 1983, 1352, 700, 1935, 3950],
 [1155, 8, 162, 1],
 [701, 1984, 428, 1985, 1014, 195, 1986, 1987, 10, 187],
 [231, 542, 1156, 366, 51, 14, 639, 3951, 3952],
 [1988, 60],
 [287, 1157, 1590, 1989, 1591, 3953, 167, 1990, 3954, 83, 783, 3955, 34, 3956],
 [436, 172, 1991, 325, 702, 233, 341, 151, 691, 466, 337, 1158, 172],
 [97, 33, 1592, 1015, 214, 190, 638, 193, 388, 5, 1],
 [70, 66, 127, 72, 502, 2, 694, 62, 5],
 [1016, 8, 63, 170, 72, 47, 2, 252],
 [1593, 42, 110, 1, 223, 2639, 1593, 144, 251, 3957, 2640, 784],
 [70, 116, 46, 212],
 [2641, 1159, 1992, 263, 238, 209, 80, 3958, 3959],
 [364, 1993, 52, 1160, 3960, 2642, 3961, 1017, 3962, 3963, 1594, 3964],
 [56, 389, 467, 894],
 [498, 785],
 [17, 22, 163, 89, 35, 67, 1353],
 [456, 457, 1122, 590, 2643],
 [885],
 [591,
  3965,
  3966,
  3967,
  2644,
  3968,
  1018,
  690,
  263,
  2645,
  1586,
  784,
  1994,
  3969,
  3970,
  3971,
  3972,
  2646],
 [1, 85, 3973, 2647],
 [52, 427, 344, 167, 540, 157, 84, 12, 362, 2634, 1017, 58],
 [9, 887, 305],
 [1595, 523, 3974, 3975, 26, 2648, 581, 1995, 154, 38],
 [340, 6, 321, 111, 2649, 93, 640, 3, 851, 6],
 [243, 3976, 1354, 3977, 1161, 1996, 786, 1596, 1355, 1597],
 [468, 4, 3978, 1598],
 [206,
  103,
  1997,
  288,
  1356,
  641,
  787,
  3979,
  2650,
  3980,
  635,
  3981,
  3982,
  3983],
 [97, 3984, 460, 67, 371, 3985, 2651, 1599, 383],
 [2652, 3986, 642, 703, 8, 63, 1162, 888, 26, 89],
 [1998, 895, 348, 1600, 643, 1999, 503, 43, 2653, 3987, 3988],
 [45, 3989, 48, 682, 15, 3990, 3991, 1998, 895, 348, 1600, 643, 3992],
 [9, 986, 118, 84, 48],
 [1163, 1547, 543, 2654, 2655, 146, 190, 543, 2654, 3993],
 [78, 2656, 89, 234, 3, 7, 55],
 [3994, 132, 1601, 30, 160, 349],
 [161,
  1602,
  59,
  265,
  3995,
  2657,
  322,
  59,
  1019,
  1357,
  1358,
  59,
  265,
  3996,
  506],
 [15, 2658, 41, 469, 1575, 2000, 1603, 852],
 [155, 896, 1604, 3997, 2659, 3998, 3999, 12, 1605],
 [41, 1164, 27, 1164],
 [41, 32, 131, 15, 2660, 225],
 [2001, 1165, 1568, 756, 399],
 [347, 897, 2002, 425, 38],
 [1606, 5, 76, 357, 82, 389, 29],
 [845, 1166, 1359, 1607, 1360, 788],
 [404, 507, 544, 74, 4, 4000, 4001, 1167, 31],
 [506, 1020, 617, 13, 2661],
 [30, 193, 2, 4002, 29, 697, 104],
 [17, 36, 22, 1128, 36, 23, 470, 786, 1596, 1355, 4003, 1126, 898],
 [5, 150, 2, 73, 253, 899],
 [704, 8, 508, 311, 47],
 [1168, 1, 1169, 705, 57, 133, 2662, 2662, 2663, 4004, 4005],
 [1608, 4006, 179, 2664, 255, 210],
 [706, 26, 4007, 545, 2003, 19, 789, 1341, 4008, 790],
 [411, 1609, 2004, 437, 304, 1361, 93],
 [546, 2665, 52, 46, 98, 26, 381, 31],
 [705, 4009, 2666, 1123, 2666, 1592, 4010, 2667, 1610, 10, 756, 13, 1],
 [1611, 131, 2005, 289, 4011, 4012, 1362, 162],
 [207, 12, 28, 44, 21],
 [4013, 61, 10, 707, 50, 1170, 1, 131],
 [4014, 700, 963, 37, 25, 19, 1, 4],
 [16, 2006, 242, 4015, 4016, 1612, 4017, 2668, 1613, 4018],
 [1591, 2007, 1614, 1989, 791, 1615, 167, 172, 4019, 2669, 2670, 4020, 4021],
 [3, 2671, 272, 148, 4022, 1616],
 [194, 137, 1342, 179, 4023, 137, 1021, 1022, 1023, 237],
 [4024, 1000, 2008, 792, 13, 18, 1617, 25, 169],
 [245, 1171, 793, 390, 1157, 708, 49],
 [290, 898],
 [228, 471, 1618, 794, 4025, 2009, 987, 4026, 2672],
 [213, 1004, 69],
 [4027, 709, 6, 148],
 [86, 245, 4028, 2010, 886, 65, 350, 50, 2],
 [2673, 170, 18, 40, 4029, 53, 2, 2011, 13, 338],
 [2, 868, 273],
 [967, 151, 472, 360, 140, 283, 406, 140, 199],
 [3],
 [2012, 83, 618, 243, 343, 4030, 323, 83],
 [269, 4031, 4032],
 [2013, 4033, 71, 54, 586, 2014, 207, 1, 4034],
 [175,
  1619,
  4035,
  67,
  710,
  210,
  90,
  107,
  4036,
  4037,
  701,
  2674,
  2675,
  538,
  963,
  1914,
  4038,
  4039,
  4040],
 [2676, 291, 363, 768, 675, 4041, 4042],
 [547, 312],
 [4043,
  862,
  760,
  4044,
  333,
  1948,
  592,
  871,
  4045,
  2628,
  4046,
  1969,
  4047,
  4048],
 [548, 4049, 20, 15, 4050, 4051, 4052, 2015, 634, 2016, 2677, 549, 4053],
 [96, 4054, 365, 4055, 241, 1158, 172, 1172, 391, 4056, 4057],
 [351, 242, 16, 20, 4058, 4059, 4060, 900],
 [48, 4061, 1620],
 [4062, 711, 7, 332, 40],
 [206, 103, 1990, 1173, 644, 176, 2678, 1363, 1618, 4063, 4064, 4065, 1364],
 [759, 407, 143, 4066, 4067, 4068, 4069, 4070],
 [2679, 300, 1, 13, 3, 463, 197, 462, 401, 107, 683, 2, 1, 281],
 [550, 4, 138, 44, 1365, 167, 399, 1621, 793, 44, 123],
 [712, 645, 1622, 170, 7, 40],
 [4071, 231, 1366, 4072, 4073, 677, 4074, 618, 1623, 4075],
 [12, 1020, 368, 12, 31, 210],
 [1624, 2017, 4076, 2018, 2680],
 [438, 1625, 33, 5, 24, 158, 438],
 [901, 5, 158, 2019, 198, 694, 56, 990, 73],
 [584, 383, 4077, 4078, 1626, 4079],
 [4080, 4081],
 [4082, 127, 15, 2681, 4083, 224, 127, 72, 64, 1917, 2682],
 [6, 98, 4084, 1627, 299],
 [795, 12, 72, 509],
 [1024, 66, 713, 196, 49, 47, 871, 40, 42, 2, 254, 501],
 [538, 1628],
 [4085, 300, 1, 41, 11, 870, 1025, 902, 4086, 537, 2020, 4087, 18],
 [578, 796, 4088, 1367, 1174, 190, 142, 297, 573, 58, 4089, 125],
 [88, 4090, 473, 870, 370],
 [352, 237, 35, 4091, 4092, 797, 4093, 1629, 4, 464, 128],
 [1630, 551, 7, 40],
 [4094, 2683, 470],
 [238, 4, 892, 256, 31, 62, 1368, 362, 4095, 182, 2684, 4096, 23],
 [385, 798, 1369, 151, 127, 775, 91, 1026],
 [2685, 188],
 [1631, 127, 72],
 [2686, 4],
 [68, 326, 112, 1632, 10, 357],
 [2687, 1370, 982, 2021, 20, 799, 119, 593, 4097, 2022, 685, 366, 2021, 1175],
 [714, 2688, 2, 201, 466, 181],
 [903, 701, 389, 38],
 [27, 144, 185, 23, 4, 2023, 40],
 [80, 1371, 1305, 412, 188, 1173, 1022, 1633],
 [695, 883, 9, 4098, 49, 123, 177, 83],
 [2024, 318, 163, 17, 22],
 [904, 2, 1932, 27, 116, 107, 413, 2689],
 [4099,
  646,
  996,
  637,
  647,
  1964,
  2690,
  4100,
  4101,
  382,
  1372,
  4102,
  1176,
  2691,
  2692,
  1584],
 [228, 471, 800, 4103, 4104, 764, 14],
 [1634, 251, 75],
 [2693, 53, 11, 801, 2694, 305],
 [4105, 552, 1177, 4106, 1178, 553, 360],
 [4107, 1373, 1374, 8, 392, 474, 20, 5, 1375, 15, 53],
 [2025, 46, 438, 1027, 99, 4108, 172],
 [1376, 1028, 1179, 44, 152, 191, 83, 4109, 14, 4110],
 [648, 185, 23, 405, 376, 4111, 1635, 648],
 [1114, 130, 20, 121, 85, 645, 4112, 74, 16],
 [52, 161, 368, 503, 7, 2023, 197, 2695, 1029, 144, 39, 142, 76],
 [174, 12, 788, 128, 750, 16, 53],
 [45, 905, 1636, 2696, 906, 167, 430, 4113, 61, 274, 103, 132],
 [907, 20, 715, 282, 153, 4114, 167, 636],
 [248, 649, 120, 1030, 13, 61, 411, 425, 225],
 [393, 191, 49, 357, 82, 4, 5, 903],
 [6, 27, 5, 131, 8, 24],
 [25, 132, 4115, 29, 439, 4, 18],
 [799],
 [270, 254, 1637, 351, 42],
 [2697, 908, 28, 157, 119, 99, 256, 53, 31, 55, 909],
 [364,
  1031,
  213,
  475,
  431,
  716,
  1638,
  1639,
  2698,
  4116,
  1180,
  716,
  431,
  2026,
  2699,
  2700],
 [476, 141, 2027, 185, 414, 1640, 24, 304, 487, 476],
 [93, 154, 1032, 59],
 [352, 2],
 [12, 183, 2, 385, 28],
 [6, 115, 1],
 [4117, 2028, 4118, 52, 1641, 134, 76, 157, 166, 44, 2701, 1642],
 [194, 1149, 4119, 143, 71, 4120],
 [194, 2702, 1643, 46, 4121],
 [771, 73, 4122, 11, 2703, 4123, 11, 1644, 802, 11, 2029, 2704, 2030],
 [1181, 10, 475],
 [994, 264, 909, 49, 2705, 67, 48, 21, 1033],
 [1034, 4124, 1034, 6, 1182, 1183, 4125, 38, 53, 3, 4126],
 [415, 8, 162, 90, 2706],
 [201, 56, 29, 1645, 49, 8],
 [647, 357, 82],
 [45, 2031, 4127, 792, 95, 75, 717, 2707, 582, 471, 4128, 1377, 4129],
 [3, 234, 31, 119, 105, 2, 2032],
 [41, 440, 1, 4],
 [910, 31, 275, 2033, 803, 1184, 178, 135, 18, 1026],
 [2615, 2034, 4130, 2035, 77, 4131, 650, 102],
 [1175, 56, 4132],
 [4133, 4134, 651, 1035, 2708, 88, 1378, 1646, 130, 235, 349],
 [1379, 241, 1185, 2036, 12, 37, 353],
 [347, 117, 441],
 [2709, 118, 10, 4135, 652, 1186, 49],
 [95, 257, 1647, 402, 4136, 985, 51, 354, 2710, 4137],
 [4138, 113, 1581, 55, 24, 2711, 24, 394, 25, 55, 24, 1026],
 [147, 12, 28, 2, 49, 2037],
 [32, 1036],
 [116, 2712, 182, 4139, 150, 255],
 [1648, 1648, 4, 4140, 1],
 [11, 59, 83, 4141, 4142],
 [1649, 292, 4143, 273, 273, 868],
 [4144, 703, 176, 4145, 911, 4146, 145, 1650, 4147, 176, 4148],
 [171, 18, 4149, 791, 1615, 59, 2038],
 [4150, 1962, 1651, 1343, 40],
 [1380,
  1989,
  4151,
  1381,
  2713,
  14,
  215,
  4152,
  2714,
  1187,
  4153,
  4154,
  34,
  4155],
 [1382,
  2039,
  533,
  912,
  717,
  4156,
  2715,
  4157,
  4158,
  4159,
  1188,
  4160,
  4161,
  4162],
 [30, 913, 30, 621, 384, 4163, 4164, 4165],
 [9, 208, 54, 48],
 [146, 970, 128, 75, 1383, 4166, 2716, 1997, 2040, 2717, 2041, 2718],
 [81, 2719, 1370, 4167],
 [2720, 4168, 149, 554, 1652, 399, 2721, 760, 4169, 4170, 4171, 4172],
 [4173, 2042, 2043, 643, 4174, 4175, 1653, 914, 384, 43, 1189, 4176, 4177],
 [18, 116, 1130, 16, 46, 375, 4178],
 [2044, 4179, 4180, 2722, 4181, 4182, 898, 804, 1037],
 [64, 972],
 [327, 108, 1654, 594, 4183, 594, 594, 594],
 [1655, 128],
 [17, 239, 36, 35, 22, 100, 20, 4184, 51, 576, 2723, 4185, 4186],
 [81, 64, 19, 4187, 16, 1],
 [364, 1993, 30, 615, 1656, 1017, 915, 4188],
 [524, 15, 31, 1023, 69],
 [81, 188, 19, 4189, 477, 92],
 [37, 25, 19, 4190, 61, 1],
 [1657,
  93,
  328,
  126,
  4191,
  2724,
  2631,
  2725,
  4192,
  1014,
  973,
  84,
  172,
  2045,
  1190],
 [367, 9, 2726, 4193, 121, 526, 620],
 [4194, 390, 85],
 [699, 39, 142, 27, 5, 32, 44, 474],
 [1191, 916, 1384, 187, 23, 917, 328, 595],
 [30, 27],
 [112, 66],
 [387, 578, 2046, 132, 2046, 4195, 718],
 [226, 25, 486, 97, 1192, 2, 2727, 4196, 428, 4197],
 [6, 271, 135, 123, 10],
 [679, 4198, 4199, 4200, 918, 1317, 1038, 4201, 16, 303, 805],
 [78, 46, 161, 32, 44, 324],
 [719, 714, 806, 1188, 2728, 2729, 1188, 585, 1193, 1039, 1170, 104],
 [2047, 16, 138, 400],
 [2048, 4202, 1385, 1386, 313, 682, 7, 2049, 2730, 478, 4203, 4204],
 [270, 117],
 [2731, 991, 13, 2050, 4205, 399, 88],
 [293, 59, 1552, 749, 596, 75],
 [402, 409, 166, 2732, 2051, 132, 228, 571, 4206, 2052, 4207],
 [9, 18, 1658, 762, 2733, 4208, 4209],
 [1308, 90, 597, 47, 2, 252],
 [506, 403, 276, 2734],
 [17, 239, 36, 35, 22, 14, 2735, 100, 43, 17, 22, 319],
 [1194, 54, 15, 2736, 649, 1038, 1378, 137, 598, 4210, 2737, 6, 207],
 [711, 49, 256],
 [39, 720, 416, 2053],
 [1040, 984, 4211, 190, 2738],
 [585, 599, 2639, 40],
 [4212, 54, 693, 104, 4213, 1, 4214, 4215, 4216, 1195, 1196, 919, 762, 785],
 [2739, 57, 149, 46, 414, 510, 59, 265, 408, 4217, 78, 807],
 [2740, 66, 33, 5, 44, 140, 199, 2741],
 [95, 106, 2054, 51, 4218, 100, 582, 471, 4219, 2742, 2055, 257, 34, 4220],
 [1041, 8, 63, 47, 2, 252],
 [699, 12, 48, 29, 33, 192, 363, 13],
 [147, 4221, 2056, 4222, 11, 38, 39],
 [16, 10, 894, 134, 3, 2057],
 [4223, 148, 4224, 4, 202, 1387],
 [70, 31, 4225, 800, 50, 287],
 [1659, 128, 4, 149, 10],
 [114, 12],
 [17, 22, 523, 297, 51, 2058, 23, 14, 636, 1980, 4226, 4227],
 [9, 1042, 1660],
 [147, 64],
 [2059, 4, 2743, 24, 180, 454, 281, 133, 28],
 [2060, 6, 2061, 1197, 4228, 383, 84],
 [129, 2744, 1],
 [293, 117],
 [145, 2062, 555, 2745, 1198, 2017, 4229, 549, 4230, 1661, 1, 145],
 [172, 4231, 417, 4232, 4233, 1129, 4234, 719, 165, 205, 4235, 4236, 2063],
 [1384, 33, 3, 162, 479],
 [9, 83, 1199, 1388, 395, 172, 20, 4237],
 [1043, 12, 254, 314, 355, 681, 193, 49, 2, 4],
 [2064, 1, 197, 24, 50],
 [6, 212, 1, 46, 193, 214, 2, 1662, 23],
 [2065, 442, 5, 23, 690, 208],
 [456, 457, 163, 1044, 4238, 50, 80, 2636],
 [2066, 1371, 402, 534, 167, 2746, 4239, 333, 4240, 2747, 4241],
 [1311, 313, 920, 300, 77, 711, 480],
 [600, 874, 1148, 15, 54, 4242, 2748, 2749, 4243],
 [9, 12, 105, 1956, 198, 4244, 192, 32, 84],
 [700, 57],
 [4, 1389],
 [1663, 4245],
 [174, 632, 219, 110, 1390],
 [2750, 115, 363, 13, 271, 2067],
 [277, 689, 4246, 1045, 139, 921, 2751, 71],
 [248,
  624,
  65,
  175,
  1032,
  4,
  57,
  721,
  248,
  1,
  279,
  651,
  430,
  486,
  2752,
  2753,
  1,
  1200],
 [1391,
  366,
  2068,
  808,
  1664,
  110,
  4247,
  675,
  2754,
  1046,
  4248,
  71,
  1142,
  4249,
  52,
  1047,
  156],
 [8, 1585],
 [157, 124, 4250, 3, 1392],
 [70, 204, 55, 13, 241, 120, 481],
 [109, 118, 1665, 62, 4251, 4252, 128, 7, 210, 1201],
 [1393, 2755, 46, 1393, 1202, 2069, 309, 177],
 [109, 67, 32, 1666, 460, 592, 2070],
 [1394, 1667, 69, 84],
 [2071, 16, 217, 10, 240, 126],
 [100, 95, 257, 409, 220, 329, 91, 2756, 4253, 4254, 4255, 4256],
 [45, 2072, 2073, 535, 4257, 4258, 922, 209, 4259, 65, 556],
 [45, 1395, 4260, 462, 39, 231, 4261, 15, 1112, 2757, 190, 2074, 4262],
 [48, 722, 1396, 499, 1048, 653, 125, 4263, 88],
 [619, 771, 367, 340, 2075, 1203, 2758, 1646, 2076],
 [95,
  106,
  1044,
  324,
  4264,
  190,
  508,
  2759,
  2760,
  923,
  2761,
  4265,
  34,
  4266,
  4267],
 [258, 920, 2762, 480, 271],
 [17, 455, 36, 13, 22, 924, 43, 90, 220, 17, 22, 319],
 [4268, 26, 48, 461, 654],
 [1608, 13, 2, 137, 108, 2763, 418, 1204, 31],
 [1668, 284, 28, 180, 29],
 [398, 66, 185, 2764, 33, 5],
 [97, 56, 82, 1397, 1669, 1978, 443, 5, 15, 4269, 723],
 [575, 2765, 7, 332, 50, 249, 4270, 4271],
 [56, 468, 202, 156, 476, 655, 1670, 2077],
 [795, 292, 12, 511, 479],
 [584, 4272, 1, 279],
 [2078, 4273, 4274, 4275, 412, 4276, 411, 329, 34, 4277],
 [476, 2079],
 [393, 1205, 809, 4278, 139, 2766, 4279, 349],
 [398, 656, 286, 501, 3, 66, 4280, 3],
 [318, 810, 242, 4281, 704, 2767, 4282, 468, 2768, 2080, 963, 4283, 476],
 [145,
  811,
  2081,
  444,
  4284,
  4285,
  2769,
  1398,
  703,
  77,
  2082,
  238,
  329,
  2770,
  4286,
  4287],
 [1559, 457, 1122, 590, 657, 255],
 [283],
 [122, 385, 186, 9, 2771, 9],
 [174, 1049, 658, 4288, 4289, 1206, 4290],
 [307, 1207, 28, 688, 55, 47, 10, 179],
 [659, 64, 1399, 4291, 4292, 4293, 190, 1399, 1312, 4294],
 [1208,
  4295,
  4296,
  4297,
  1191,
  2772,
  4298,
  167,
  757,
  1400,
  391,
  52,
  161,
  1209,
  1401,
  2083],
 [601, 1402, 2084, 597],
 [1307, 16, 217, 10, 240, 126],
 [724, 4299, 1210, 4300, 724, 904, 4301],
 [45,
  4302,
  907,
  2773,
  2085,
  2774,
  1323,
  973,
  2086,
  660,
  2087,
  1671,
  412,
  4303],
 [4304, 1403, 2775, 4305, 1404, 595],
 [9, 925, 1025, 124],
 [541, 291, 1, 2088, 223, 166, 1211, 2089, 4306],
 [1050, 12, 602],
 [4307, 711, 201, 49, 82, 24, 4308, 39],
 [781, 109, 4309],
 [322, 59, 265, 2090, 91, 112, 371, 337, 91, 91, 2776],
 [2575, 154, 39, 926, 4310, 267, 2091, 11, 61, 99],
 [4311, 261, 179],
 [699, 592, 1405, 57, 1672, 208, 2092, 1368, 253],
 [1945, 4312, 152, 337, 1673, 30, 280, 81, 603, 2777, 302, 352, 4313, 913],
 [1212, 261, 681, 49, 193, 1213],
 [1051, 1674, 42, 225, 18, 35, 4314, 890],
 [2778, 725, 474, 589, 1675, 4315, 183, 291, 223],
 [1052, 12, 211, 142, 8],
 [102, 126, 544, 79, 350, 1676],
 [41, 50],
 [2093, 2779, 99, 11, 2779, 272, 39, 38],
 [382, 38, 1, 39, 142],
 [113, 646, 6, 845, 4316, 2071, 2780, 394, 2094],
 [274, 103, 2095, 93, 1406, 2096, 1214],
 [45,
  1677,
  2781,
  2097,
  244,
  2782,
  4317,
  4318,
  1567,
  4319,
  4320,
  719,
  344,
  4321],
 [206, 103, 1927, 4322, 2783, 306, 51, 153, 34, 4323, 4324],
 [714, 101, 1, 46, 79, 181],
 [4325],
 [385, 798, 1369, 372, 135, 127, 275, 38, 589, 481],
 [902, 24, 4326, 530, 2098],
 [724, 774, 23, 74, 1573],
 [726, 44, 2676, 1183, 355],
 [2099, 859, 13, 8, 24, 2],
 [2019, 16, 138, 1678, 4327, 353, 168],
 [9, 1215, 418, 2784, 512, 4328, 92, 18],
 [1216, 264, 980, 170, 13, 42, 13, 50, 55],
 [41, 4329, 4330, 72, 89, 1115, 111, 215, 332, 40],
 [387, 639, 462, 13, 116, 1, 2100, 1051, 107, 413, 1325, 2785, 5, 500, 116, 1],
 [307, 127, 72, 2786],
 [393, 71, 4331],
 [1318, 12, 28, 4332, 2787, 543],
 [1379, 12, 164, 479],
 [1679, 4333, 128, 372, 574, 247, 171],
 [1046, 11, 1331, 70, 604, 156],
 [1680, 1402, 2084],
 [16, 23, 259, 133, 196],
 [6, 67],
 [1679, 4334, 7, 2788, 7, 4335, 1407, 4336, 600, 4337, 247],
 [1190, 118, 2789, 603, 433, 2],
 [1399, 4338, 1399, 2655],
 [293, 1053, 419, 2790, 1681],
 [1605, 25, 1054],
 [296],
 [112, 301, 4339, 373, 772, 373, 93, 1016, 68, 138],
 [270, 362, 4340, 2791, 38, 255, 64],
 [369, 38, 1217, 302, 108],
 [7, 166, 175, 4341, 2101, 872, 125, 317],
 [161, 102, 661, 123, 1218, 27, 271, 445, 10, 76],
 [2792, 13, 2, 2793, 299],
 [52, 215, 352, 39, 142],
 [194, 3, 139],
 [307, 27, 99],
 [],
 [1051, 204, 120, 332, 42, 2794, 2102, 1172, 434, 220],
 [9, 208],
 [171, 673, 752, 129, 57, 141, 35, 283, 318, 46],
 [1408, 27],
 [706, 28, 2103],
 [653, 125, 1682, 9, 542, 628, 857, 2795, 2104, 302],
 [1683, 175, 63, 557, 71, 143, 2105, 27, 59, 265, 2796, 884, 4342],
 [30, 341, 341, 199, 2106, 4343, 658, 2107, 160, 507, 797, 886],
 [604, 254, 212, 85],
 [127, 770],
 [812, 1151, 416],
 [270, 7, 380],
 [45,
  1684,
  1216,
  430,
  4344,
  14,
  2108,
  4345,
  4346,
  1317,
  4347,
  4348,
  4349,
  2797],
 [1219, 813, 62, 53, 27],
 [9, 241, 57, 359],
 [71, 143, 2798, 4350, 2109, 65, 1367, 56, 156, 202],
 [254, 212, 1685, 4351, 1686],
 [179, 1220, 3, 69, 86, 603, 156, 461, 1409, 179, 1220, 3],
 [21, 2, 163],
 [1638, 1000, 210],
 [1574, 4352],
 [9, 1021, 15, 1687, 4353, 2102],
 [9, 83, 1199, 11, 267, 1410, 533, 335],
 [4354, 18, 537, 205, 4355],
 [4356, 4357, 4358, 4359, 4360, 242, 1411, 1221, 1920],
 [4361, 4362, 73, 2110, 399, 14, 159, 4363, 4364, 4365],
 [206, 103, 2799, 4366, 26, 2800, 983, 96, 4367, 183, 1364, 1222, 4368],
 [32, 849, 1055],
 [75, 546, 126],
 [1056, 1412, 927, 1413, 603, 570, 1414, 31, 2111, 20, 656, 4369],
 [1415, 482, 989, 2801, 1556, 2687, 961, 4370],
 [15, 605, 181, 558, 8, 392, 348, 2019, 226, 4371, 4372, 4373, 4374],
 [148, 4, 522, 2039, 2802, 28, 135],
 [236, 64],
 [52, 1223, 161, 1224, 1338, 26, 276, 814],
 [575, 372, 329, 50, 249, 51, 14, 1322, 2803, 4375],
 [659, 928, 929, 353],
 [393, 197, 420, 815],
 [796, 2112, 456, 853, 525, 1225, 1174],
 [796, 33, 3, 24, 255, 55, 47],
 [95, 106, 4376, 4377],
 [170, 4378, 522, 32, 13],
 [223, 351, 2804, 4379, 816, 384, 14, 356, 159, 4380, 4381],
 [17, 36, 22, 1226, 43, 559, 51, 753, 23, 525, 20, 526, 111, 17, 22, 319],
 [226, 33, 3, 510, 77, 23, 662, 203, 5],
 [1649, 26, 18, 28, 713, 1197, 135, 419, 2113],
 [4382, 1416, 11, 2114, 4383, 2805, 4384, 18, 115, 439],
 [318, 396, 242, 727, 1144, 1580, 861, 2115, 86, 655, 461, 1223, 1417, 2080],
 [176, 4385, 4386, 14, 2806, 43, 266, 4387],
 [4388, 97, 2807, 660, 4389, 2808],
 [1, 6, 65],
 [930, 10, 187, 303],
 [9, 2609, 10, 105],
 [921],
 [1688,
  4390,
  20,
  469,
  352,
  1227,
  638,
  4391,
  2116,
  218,
  225,
  269,
  493,
  2117,
  173,
  268,
  189,
  194,
  1389],
 [207, 66, 241, 216, 788, 804, 4392, 1228, 638, 988],
 [215],
 [433, 4393, 2809, 931, 4394, 2049, 606, 2118, 335, 2810, 8, 2, 121],
 [536, 90, 251],
 [4395, 33, 241, 75, 313, 13, 453, 2621, 169, 123],
 [790, 1203, 817, 818, 1142, 59, 2119, 11],
 [463, 219, 1, 4396, 1206, 491, 4397, 54, 1689, 148],
 [206, 103, 2811, 309, 177, 1418, 1354, 1172, 96, 259, 2120, 1419],
 [152, 775, 4398, 2812, 1420, 1057, 4399],
 [4400, 4401, 4402, 819, 1602, 295],
 [2006, 129],
 [1421, 4403],
 [2813, 4404, 4405, 1229, 510, 2814, 2815, 235, 409, 1230, 4406, 4407, 4408],
 [37, 64, 19, 4409, 92],
 [2121, 100, 250, 32, 144, 1163, 9],
 [78, 171, 219, 1676, 1690],
 [1046,
  104,
  928,
  353,
  168,
  4410,
  4411,
  4412,
  2122,
  2816,
  4413,
  1608,
  2817,
  2818,
  4414,
  289,
  4415,
  701,
  4416,
  2123,
  4417],
 [818, 2124, 244, 2819, 4418, 858, 43, 4419],
 [1, 138, 275, 4420, 3, 114, 4421, 25, 188, 18, 2820, 62, 3, 2125],
 [1172, 1231, 854, 14, 391, 896, 4422],
 [1655, 185, 65, 4423, 65, 901, 1422, 1],
 [45,
  2821,
  245,
  2126,
  4424,
  350,
  15,
  1204,
  4425,
  4426,
  2822,
  4427,
  4428,
  4429],
 [4430, 224, 4431, 2823, 4432, 1187, 42, 2127, 1407, 2824, 4433, 34, 4434],
 [2128, 12, 1],
 [71, 143, 1387, 463, 2825, 1423, 2826, 419, 932, 207],
 [41, 1, 916, 4435, 1973, 48],
 [347, 331, 2129, 1424, 324, 16],
 [228, 1132, 59, 847, 673, 1058, 36],
 [971, 5, 116, 3],
 [622,
  20,
  808,
  218,
  52,
  161,
  1691,
  2827,
  2828,
  142,
  4436,
  663,
  2130,
  1314,
  2131,
  126],
 [5, 433, 782, 170, 49, 234, 313, 31, 55],
 [1232,
  5,
  646,
  466,
  225,
  371,
  1059,
  1692,
  156,
  70,
  1651,
  550,
  2829,
  403,
  1233,
  1047],
 [203, 29, 2132, 600, 196, 62, 5, 499],
 [113, 207],
 [1138, 4437, 2830, 16, 178, 1033, 183, 2133, 1425, 78, 4438],
 [4439, 23, 637, 2831, 1426, 1234],
 [398, 1180, 4440, 404, 359, 1427],
 [398, 442, 331, 4441, 82, 446, 2, 662, 13],
 [9, 66, 1183, 16, 99, 21],
 [2117, 2832, 85, 4442, 1235, 459, 430, 235, 1236, 414, 624, 1428, 1060],
 [213, 239, 69],
 [118, 2, 4443, 688, 55, 47, 55, 2005],
 [2646, 66, 470, 409, 44, 437],
 [387, 4444, 4445, 903],
 [15, 2833, 2134, 4446, 4447, 1170, 607, 14, 4448, 4449],
 [41, 236, 2, 29, 710, 688, 196, 1237, 1192],
 [45, 4450, 95, 926, 728, 150, 39, 1127, 591, 933, 1061, 1421, 142, 4451],
 [4452, 31, 467, 2135, 528, 1955, 4, 60, 2136, 136],
 [547, 151, 346, 406, 327, 44, 141, 233, 57, 233, 429, 414, 50, 283],
 [213, 174, 403, 4453, 1349, 4454, 1693, 4455, 1378, 1238],
 [4456, 2834, 72, 2137, 934, 288, 895, 153],
 [362, 2070, 889, 96, 11, 845],
 [403, 27, 5, 4, 726, 21, 10],
 [1239],
 [820, 680, 1429, 4457, 11],
 [68, 1182, 4458, 656, 498, 77, 73, 41, 57],
 [81, 188, 19, 1965],
 [4459, 621, 39, 436, 172, 233, 341, 151, 691, 18, 1617],
 [4460, 2835, 2836, 1135, 21, 632],
 [587, 12, 31, 216, 164, 85],
 [711, 28, 39],
 [2012, 83, 4461, 1038, 1964, 4462, 4463, 4464],
 [727, 151, 346, 313, 31, 93, 57, 608, 4465, 632, 93],
 [4466, 992, 5, 185, 2114, 546, 1212],
 [1694, 2837],
 [532, 8, 63, 101, 79, 400, 2, 178],
 [2838, 33, 222, 28, 2],
 [12, 109, 592, 4467, 4468],
 [291, 1, 90, 4469, 49, 2138, 1695, 2089, 195, 183, 2, 729],
 [4470, 2139, 2140, 80, 4471, 447, 4472, 4473],
 [173, 268, 493, 730, 494, 2839, 39, 231, 1198, 2840, 50],
 [2841, 1240, 437],
 [218, 4474, 4475, 262, 4476, 225, 2141, 1062, 31, 4477, 109, 781, 160],
 [601, 7, 385, 919, 606, 1696],
 [1225, 2842, 2708, 4478, 606, 1, 1697, 212, 87, 44],
 [45, 2072, 4479, 644, 83, 471, 201, 800, 448, 4480, 250, 4481, 4482, 556],
 [2843, 4483, 2844, 935, 644, 278, 660, 4484, 4485],
 [311, 4486, 1358, 2845, 2846, 1325, 6, 586, 182, 1698],
 [4487, 2142, 168, 1699, 371, 6, 513],
 [335, 99, 1362, 21, 324, 335, 683, 335, 1137, 683, 791],
 [81, 25, 19, 2847],
 [2848, 1700, 873, 1426, 211, 4488, 5, 108, 1241, 637, 476, 4489],
 [24, 524, 323, 99, 21, 323, 4490, 188, 2143, 936, 560, 323, 662, 13, 323],
 [30, 4491, 1242, 1669, 681],
 [113, 646, 199, 2849, 1160, 1430],
 [9, 26, 1701, 121, 57, 1702, 774],
 [1703, 56, 256, 1431, 55, 4492],
 [171, 1243, 266, 132, 2850, 1704, 1042, 2851, 409, 1705, 1148, 471, 397],
 [340, 6, 156, 653, 125, 1063, 20, 460],
 [2852,
  70,
  4493,
  2563,
  2619,
  54,
  241,
  11,
  514,
  553,
  4494,
  310,
  75,
  95,
  119,
  1015],
 [31, 275, 182, 21, 4495, 937],
 [701, 1984, 428, 1985, 1014, 195, 1986, 1987, 10, 187],
 [95, 106, 559, 540, 191, 923, 780, 2853, 938, 4496, 4497],
 [173,
  268,
  718,
  67,
  371,
  731,
  20,
  383,
  84,
  561,
  1064,
  561,
  2854,
  2855,
  494,
  730,
  269,
  2856,
  1244,
  2857,
  173],
 [174, 789, 657, 4498, 170, 4499, 114, 31, 138],
 [1245, 107, 72, 304, 4500, 764, 260, 4501, 1229, 2858, 1035, 608, 31, 174],
 [1339, 3, 119, 144, 110],
 [232, 385, 1246, 4, 39, 119, 2859, 386],
 [237, 24],
 [4502, 115, 4503],
 [483, 28, 2144, 222, 2860, 140, 2861],
 [37, 25, 19, 4504],
 [2145, 191, 224, 20, 1015, 2862, 108, 4505, 4506, 502],
 [4507, 641, 4508, 1031, 4509, 2146, 120, 42, 541],
 [7,
  555,
  166,
  562,
  434,
  2062,
  2147,
  2001,
  4510,
  1706,
  4511,
  424,
  4512,
  2863,
  1056,
  2545,
  1119,
  4513,
  2864],
 [1432, 1061, 4514, 329, 95, 582, 246, 51, 14, 153, 34, 4515, 4516],
 [4517, 98, 21, 26, 48, 195, 5, 2148],
 [30, 322, 59, 265, 112, 91, 447],
 [404, 2149],
 [403, 98, 229, 7, 40, 85],
 [596, 716, 515, 548, 716, 203, 373, 54, 1707, 373, 137, 2865, 208],
 [2150, 2151, 939, 4518, 2150, 732, 267, 4519, 484],
 [401, 110, 327, 54, 1247, 868, 48, 563],
 [1065, 24, 7, 40, 2866, 104, 301, 55],
 [62, 1066, 1433, 1213, 2867, 35, 1708, 1434, 2152],
 [65, 508, 644, 83, 4520, 940, 664, 846, 1709],
 [686,
  2868,
  4521,
  287,
  122,
  4522,
  4523,
  2869,
  173,
  268,
  69,
  158,
  215,
  25,
  2870],
 [226, 1710, 24, 5, 2, 477, 131, 40, 286],
 [2153, 27, 53, 3, 124, 1711],
 [732, 560],
 [4524, 665, 1034, 2154, 4525, 7, 40, 58, 163, 1248],
 [171, 202, 463, 226, 30, 142, 1712, 2871],
 [147, 12, 93, 4526, 2155, 654, 1435, 1713],
 [330, 2872, 2873, 80, 60, 44],
 [112, 2156, 363, 821, 134, 1714, 2874, 593, 76, 1929],
 [226, 133, 72],
 [733, 1713, 2003, 19, 28, 157, 169, 31, 195],
 [189, 55, 18, 454, 1715, 197],
 [9, 1, 352, 315],
 [244, 4527, 4528, 516, 1420, 2875, 2876, 7, 1156, 2877],
 [211, 1, 15, 119, 67, 1698, 645, 583, 4529, 497, 145],
 [17, 239, 36, 13, 22, 316, 1436, 111, 100, 43, 17, 22, 319],
 [9, 83, 1199, 177, 83, 432],
 [307, 1, 50, 2878, 157, 4530, 4531, 48, 35, 1437, 4532],
 [148, 32, 82, 148, 23, 2027, 4, 4533, 374, 11],
 [4534, 852, 1067, 4535, 4536, 2683],
 [609, 128, 10],
 [296, 5, 2879, 1063, 276, 2880, 84],
 [2, 13, 324, 1068],
 [2157, 1249, 2881, 4537, 21, 10, 102, 2158, 4538],
 [679, 3, 1716],
 [971, 271, 779, 123, 10],
 [202, 4539, 252, 2882, 816, 4540, 4541, 2159, 1717, 112, 89, 1018, 4542],
 [822, 149, 4543, 33, 5],
 [49, 345, 98, 630, 163],
 [213, 4544, 69, 2135, 2883, 379],
 [278, 2884, 243, 935, 343, 485, 356, 4545, 4546],
 [31],
 [393, 146, 105, 4, 79, 3, 2885, 4547, 26, 4548],
 [9, 31, 4549],
 [1638, 12, 1438],
 [97, 48, 256, 711, 722, 67],
 [287, 7, 631, 50, 88],
 [213, 54, 7, 4550, 263, 77, 32],
 [4551, 16, 217, 10, 240, 126],
 [1176, 1],
 [4552, 4553, 468, 369, 4554, 600, 2160],
 [22, 4555, 282, 80, 4556, 14, 4557, 96, 4558, 4559],
 [705, 6, 76, 21],
 [511, 21, 562, 4560, 769, 1, 58, 236, 6, 1, 92, 16, 1017, 13, 35],
 [17, 455, 36, 35, 51, 14, 159, 1718, 4561],
 [1694],
 [601, 1439, 260, 20, 2886, 384, 68, 991, 50, 408, 2887, 4562, 819, 4],
 [1144, 8, 63, 917, 47, 203, 2, 178],
 [35, 40, 35, 214, 4563, 26, 131, 94, 29],
 [799, 1719, 1440, 531, 219, 620],
 [310, 77, 4564, 52, 161, 1664, 35],
 [9, 121],
 [37, 25, 19, 2161, 61, 1],
 [86, 27, 313, 212],
 [1441, 917, 4565, 2162],
 [2163, 1, 114, 24, 1720, 58, 23, 1227, 435],
 [642, 1001, 272, 411, 1909, 2888, 4566],
 [2685, 389, 135, 38, 308, 4567],
 [9, 208],
 [724, 223, 221, 814, 276, 1069, 192, 2618, 65, 628, 401, 359],
 [41, 104, 564, 415, 180, 27, 1442, 449, 180, 197, 58],
 [2164,
  2889,
  2165,
  1070,
  4568,
  2663,
  2020,
  1351,
  2166,
  2890,
  4569,
  4570,
  4571,
  4572,
  4573],
 [732,
  74,
  600,
  4574,
  2891,
  219,
  1334,
  494,
  2805,
  1566,
  2892,
  2583,
  4575,
  74,
  4576,
  2891,
  280],
 [2893, 4577, 4578, 1721, 83, 272, 85, 2167, 941, 2894, 212, 4579, 4580, 4581],
 [1587, 541, 339, 487],
 [1722, 1, 102, 661, 123, 238, 779, 123],
 [86, 4582, 2168, 1192, 210, 28],
 [4583, 511, 540, 277, 2766, 112, 2169, 112, 2170],
 [4584],
 [706, 33, 3, 446, 229],
 [2171, 761, 320, 281, 4, 18, 344, 780, 20, 942],
 [473, 8, 3],
 [1723, 893, 326, 2172, 4585, 2104, 927, 4586, 2173, 210, 4587, 823, 999],
 [1250, 104, 388],
 [148, 64, 170, 4588, 477, 106, 168, 134, 38],
 [4589, 53, 479],
 [862, 760, 91, 308, 465, 51, 14, 4590, 4591],
 [812, 2174, 249, 517, 1724, 32, 872, 208],
 [14, 153, 2175, 2895, 1070, 4592, 391, 4593, 4594, 34, 4595],
 [41, 64, 972],
 [128],
 [2896, 504, 462, 2897, 4596, 4597, 4598, 1161, 4599],
 [8, 619, 539, 173, 930, 2176],
 [109, 21, 610, 21, 136, 116, 500, 1, 38],
 [60, 58, 6, 626, 7, 40, 70, 129, 611, 4600, 15, 1562],
 [2177, 2178, 31, 104, 182, 143, 31, 943],
 [189, 117, 400, 522],
 [83, 308, 703, 2179, 1179, 1071, 209, 209, 2898, 4601, 4602],
 [30, 1724, 10, 129, 131, 709, 32],
 [2899, 8, 63, 47, 2, 178],
 [378, 406, 335, 1030, 42],
 [41, 195, 169, 2011, 388, 4603, 29, 2900, 180],
 [1174, 213, 858],
 [2180, 322, 59, 265, 1443, 337],
 [858, 1068, 4604, 2901, 544, 38],
 [4605, 4606, 2552, 2902, 1603, 4607, 166, 2098, 4608, 1019, 782],
 [907, 20, 508, 1444, 1445, 282, 2181, 4609],
 [930, 10, 187, 303],
 [189, 24, 688, 4610, 24, 2, 55, 299],
 [1251, 580, 2712, 130, 1173, 2903, 557, 4611, 1203, 867, 666],
 [4612, 4613, 32, 2076, 74, 2904, 4614, 74, 21],
 [1725,
  998,
  4615,
  899,
  1379,
  4616,
  1155,
  4617,
  1072,
  773,
  4618,
  52,
  268,
  335,
  1446],
 [604, 8, 63],
 [9, 4619, 2182, 409, 262, 1073, 1979],
 [584, 105, 388, 183, 136, 56, 533, 334, 1058, 4620, 2905, 232],
 [2183, 42],
 [2906, 135, 47, 4621, 589, 4622, 4623, 4624],
 [4625, 216, 1055, 21, 19, 28, 1708, 4626, 607, 279],
 [86],
 [2184,
  547,
  44,
  2907,
  281,
  944,
  2908,
  418,
  4627,
  945,
  4628,
  1362,
  70,
  122,
  258,
  1037,
  1726],
 [1252, 2185],
 [724, 624, 154, 11, 4629, 277, 112],
 [2909, 450],
 [2186, 424],
 [1630, 946, 889, 11, 362, 184, 13],
 [1727, 207, 120, 479],
 [502, 480, 102, 49],
 [81, 64, 19, 2910],
 [4, 30, 60, 2911, 69, 155, 2187, 4630, 69],
 [506, 171],
 [30, 328, 72, 401],
 [26, 18, 186, 15, 58],
 [97, 610, 411, 1447, 306],
 [2912, 142, 3, 4631, 1060],
 [2913, 2914, 2814, 1448, 354, 132, 907, 95, 4632, 4633],
 [37, 25, 19, 2915],
 [206, 103, 824, 1728, 5, 516, 203, 4634, 2188, 100, 4635],
 [293, 415, 986, 373, 962, 38, 4636, 48, 4637],
 [484, 1729, 458, 2916, 93, 151, 283, 1730, 39, 69, 732, 21],
 [52, 1066, 313, 13],
 [101, 280],
 [9, 71, 1449, 887, 198, 947, 198, 255, 887, 208],
 [541, 27, 5, 120, 40],
 [1731, 33, 5],
 [17,
  926,
  529,
  36,
  22,
  678,
  51,
  111,
  2917,
  4638,
  2018,
  2680,
  2189,
  525,
  4639,
  4640],
 [194, 4641, 867, 817, 1352, 4642, 2190, 107, 825, 244, 31, 4643, 4644],
 [81, 25, 19, 4645],
 [121, 826, 404, 826, 404, 826, 281, 1726],
 [30, 4646],
 [4647, 16, 4648, 140, 2918, 2919],
 [278, 1178, 2920, 4649, 243, 343, 399, 149, 827, 4650, 4651],
 [16, 428, 10, 4652, 76],
 [65, 1450, 4653, 788, 204, 64, 562, 667, 31],
 [1595, 248, 75, 4654, 2191],
 [1064, 748, 261, 5, 231, 20, 3, 4655, 731, 184, 1675, 2921],
 [26, 873, 4656, 7, 1025, 306, 1074, 26, 2599, 346, 58, 6, 41, 173],
 [54, 226, 1025, 30, 152, 44, 4657, 119, 133, 2660, 32],
 [4658, 4659, 4660, 33, 999, 13, 604, 2922, 2923, 2192, 62, 21, 4661, 4662],
 [494, 271, 123],
 [248, 2185, 792, 2193],
 [1706, 734, 216, 72, 1732, 85],
 [1733, 731, 267, 69, 1245, 35, 630, 208],
 [4663, 272, 426, 26, 77, 2924],
 [27, 12],
 [17, 425, 36, 22, 131, 51, 753, 23],
 [70, 4664, 537, 416, 24, 221],
 [4665, 826, 4666, 552, 2925, 1253, 57, 30, 4667, 4668, 386],
 [1254, 289, 325, 773, 16, 23, 259, 133, 196],
 [334, 12, 74, 1075, 281, 4669, 4670, 149, 479, 1255],
 [1553, 292, 12, 788, 446],
 [4671, 28],
 [648, 204, 1076, 119, 2, 136, 2194],
 [4672, 57, 931, 300, 149, 633, 91, 1],
 [147, 2926],
 [398, 668, 37, 127, 654, 21],
 [1734, 8, 63, 252, 2, 21, 47],
 [2195, 11, 2196, 146, 4673, 16, 258],
 [2927, 23, 4, 500],
 [9, 216, 864, 588, 735, 4674, 4675, 251],
 [642, 91, 65, 1060, 279, 4676, 4677, 4678],
 [112, 185, 23, 405, 376, 1931, 4679, 112],
 [1450, 21, 1735, 136, 38],
 [3, 3, 484, 130, 1211, 4680, 2110, 1736],
 [11, 4681, 2928, 2929, 4682, 2930, 4683, 334, 485, 39],
 [2931, 150, 322, 59, 265, 1451, 91, 112, 447],
 [532, 1077, 205, 62, 7, 210, 1029],
 [10, 187, 303],
 [278, 2853, 1452, 15, 370, 51, 354, 266, 34, 994, 4684],
 [1041, 501, 4685, 1708, 1434],
 [1130, 57, 306],
 [132, 395, 77, 236, 150, 263, 4686, 2177, 1707, 5, 1393],
 [296, 920],
 [4687, 364, 777, 29],
 [2197, 4688, 853, 198, 1394],
 [81, 25, 19, 2932],
 [718, 617, 79, 2041, 26, 340, 131, 332, 42, 340],
 [171, 54, 1671, 359, 828, 1256, 59, 2933, 35, 4689, 1453, 35],
 [234, 192, 7, 40, 2010, 366, 93, 28],
 [45, 2934, 4690, 862, 760, 2198, 34, 4691, 34, 4692, 385],
 [948, 59, 237, 1006, 552, 1944, 1627, 59],
 [1135, 263, 4693, 4694, 1253, 115, 729, 263, 4695, 1594],
 [1454, 32],
 [4696, 72, 24],
 [2024, 318, 163, 17, 22, 2199, 4697],
 [45,
  1136,
  4698,
  274,
  103,
  309,
  2935,
  649,
  1257,
  4699,
  1258,
  4700,
  2678,
  485,
  1259],
 [4701, 2936, 2937, 1260, 4702, 4703, 2938],
 [78, 105, 2200, 2201, 4704, 2939],
 [695, 2940, 43, 769, 89, 4705, 641, 1596, 4706, 4707],
 [949, 1261, 441, 82, 7, 49],
 ...]
In [53]:
test_sequences
Out[53]:
[[342, 1159, 2168, 6264, 11, 8338, 39],
 [],
 [4860, 1759, 6693, 7978, 6687, 2709],
 [1144, 1628, 84],
 [45, 2821, 1142, 1272, 14, 73, 3445, 8091, 1456, 939, 1310, 1163],
 [237, 11, 30, 154, 156, 26, 1660, 59, 2260],
 [43, 93, 1865, 640, 5909, 607, 1128, 201, 8722],
 [26, 643, 235, 42, 794, 32, 35, 7188, 8290, 1082, 5, 992, 58, 3847],
 [2265, 1722, 2929, 3898, 62, 224, 134, 55],
 [572, 1284, 2196],
 [1186, 1629, 6067, 344, 1356, 922],
 [81, 64, 19, 16, 1],
 [33, 31, 16, 496, 10, 187],
 [385, 4, 21, 10, 27, 5, 254, 170, 4012, 8366, 241, 44, 2783],
 [161, 311, 65, 3264, 1, 91, 322, 59, 265, 1883, 93, 6817, 574, 843],
 [475, 322, 59, 265, 91, 91, 447, 112, 447],
 [228, 243, 935, 343, 940, 356],
 [1781, 123, 171, 2577, 11, 306, 171, 26, 2520, 263, 73, 208],
 [16, 646, 189, 2080, 52, 8245, 2964, 1239, 1889, 2750, 4107],
 [113, 2019],
 [724, 20, 8, 1832, 724, 156],
 [45, 1677, 2213, 4743, 87, 1089, 905],
 [28, 48, 579, 117, 16, 46, 166, 277, 60, 92],
 [48, 3, 105, 13, 215, 441, 304],
 [1657, 270, 204, 1100, 62, 1470, 956, 15, 1282, 85],
 [848, 6594, 234, 5, 255, 168],
 [145, 120, 1, 1218, 142, 284, 149, 458],
 [8118, 7, 2074, 29, 356, 43, 176, 1222],
 [293, 2598, 266, 512, 590, 165, 2600, 18],
 [2778, 931],
 [3240, 1241, 242, 110, 252, 118, 48, 119, 110],
 [4, 105, 1033],
 [290, 1276],
 [616, 1571, 49],
 [143, 3446, 23, 2294, 276, 604],
 [882, 4810, 1316, 226, 595, 4373, 8385, 6543, 669, 541, 1176, 2245],
 [100, 20, 1044, 96, 201, 302, 745, 165, 1106],
 [979, 8, 137],
 [6, 33, 446, 2, 4653, 2470, 141, 833, 518, 178, 2],
 [1500, 12, 299, 168, 221, 48],
 [81, 25, 19],
 [17, 239, 36, 13, 22, 316, 1436, 297, 100, 17, 22, 319],
 [41, 38],
 [71, 23, 20, 2017, 1094, 106],
 [245, 1171, 233, 955, 390, 708, 29],
 [727, 1007, 12, 7249, 12, 92, 65, 130, 57],
 [14, 764, 282, 1820, 356, 159],
 [7, 166, 6886],
 [1, 138, 756, 4299, 1859, 123],
 [9, 10, 187, 303],
 [3441, 318, 17, 22, 14, 204, 100, 43],
 [1971, 33, 617, 90, 195, 336, 523, 169, 772, 275],
 [226, 8, 63, 888, 47, 2, 178],
 [7701, 5223, 7297, 1330, 176, 11, 890, 9, 3658, 70],
 [4567, 511, 5793, 1],
 [24, 21, 29, 38, 74, 1276],
 [56, 25, 87],
 [1760, 95, 625, 5283, 3135, 20, 2240, 5284, 5285],
 [369, 79, 313, 500],
 [3370, 368, 64, 28, 125, 58, 1048, 317],
 [98, 6922, 199, 90, 1165, 315, 4584],
 [45, 3541, 698, 1267, 410, 6981, 565, 6982, 3419, 6983],
 [7, 1705, 1279, 233, 2874, 4011, 1484, 18],
 [1583, 3023, 125, 58, 1139, 853, 1652, 1063, 1085, 3259, 3411],
 [615, 3193, 6855, 630, 1029, 839, 196, 839, 564, 169, 53, 29, 68, 113],
 [885, 79, 11, 11, 577],
 [218, 810, 1046, 155, 172, 6135, 1600, 3, 30, 177, 1],
 [58, 521, 23, 1139, 3113, 1428, 694],
 [707, 42, 2504, 882, 230, 681, 36, 725, 4106, 394, 910, 4062, 595, 1347],
 [251, 1547, 1235, 459, 2324, 397, 145, 2185, 1752],
 [2750, 4, 12, 216, 848, 284, 436, 1359],
 [830, 368, 1346],
 [91, 65, 12],
 [340, 1, 138, 38],
 [30, 119, 1, 94, 31, 74, 104],
 [289, 27, 173, 380, 3320, 420, 150, 480, 2092, 993, 281],
 [720, 58, 87, 971, 467],
 [436, 172, 6, 466, 177, 702, 282, 702, 233, 341, 151, 691, 18, 1617],
 [81, 64, 19, 16],
 [9, 149, 49],
 [1, 1802],
 [203, 5, 475, 44, 1705, 203, 85, 873],
 [886, 3520, 886, 981],
 [185, 23, 405, 376, 1635],
 [6, 8427, 1276, 2681, 594, 594, 594],
 [229, 1361, 39, 196, 632, 414],
 [12, 210, 453, 997, 215, 8752, 965, 222, 286, 301, 69],
 [207, 48, 21, 204, 210, 195, 366, 12, 8236, 10, 191, 29, 480],
 [41, 332, 40, 364, 148, 2742, 229, 93],
 [6, 4623, 148, 5916, 7695, 172, 371, 581],
 [653, 125, 156, 9, 173, 268, 1375, 3330, 472],
 [572, 28, 56, 29],
 [15, 1196, 827, 1179, 417, 2113, 186, 3408, 3247, 165, 324, 34, 8717],
 [858, 3801, 550],
 [2470, 3002, 158, 7425, 89],
 [194, 1, 2886, 134, 129, 290],
 [438, 304, 328],
 [1376, 1028, 1591, 2007, 179, 1715, 4831, 872, 4443, 14, 2613, 3050, 34],
 [2698, 28, 183, 2, 234, 192, 215, 40],
 [62, 1486, 2202, 61, 1, 208],
 [2956, 8875],
 [714, 1729, 68, 894, 466, 181],
 [243, 935, 343, 14, 1459, 1336, 149, 554, 266, 962],
 [1620, 19, 162, 227, 72, 89, 48],
 [1,
  646,
  2827,
  1227,
  435,
  2019,
  583,
  145,
  4529,
  497,
  438,
  2339,
  2480,
  1,
  4661,
  218],
 [1091, 3, 4, 23, 42, 5246, 75, 326, 179, 10, 8807],
 [6, 469, 62, 3, 2, 15, 843],
 [403, 112, 317, 5775, 638],
 [10],
 [648, 305, 119, 2, 13],
 [277, 54, 144, 828, 118, 10, 445],
 [845, 311, 28],
 [8821, 10, 1394, 357, 82],
 [7108, 170, 64, 353, 49],
 [2130, 2753],
 [1169, 1859, 272, 3117, 2357, 157, 1103],
 [200, 3053, 762, 658, 219, 26, 278, 831, 150],
 [438, 419, 3399, 7270, 418, 143, 2393, 745, 78, 1625, 2442],
 [3638, 3478, 1630],
 [56, 286, 352, 1269, 57, 54, 1773, 255, 84, 1006, 422, 379],
 [8205, 168, 27, 229, 24, 48, 18],
 [170, 579, 526, 13, 1406, 2042, 42, 67, 6342, 214, 1804],
 [398, 12, 110, 1350, 198, 379, 133],
 [6, 7805, 10, 445],
 [12, 232, 3604],
 [1, 633, 3538],
 [584, 46, 570, 325, 334],
 [6587, 3516, 380],
 [1281],
 [74, 1, 32, 23, 1],
 [342],
 [1659],
 [1692, 27, 5, 66, 264, 8, 357, 137],
 [1630, 237, 755, 362, 2214],
 [340, 6, 44, 437, 58, 3206, 234, 3, 119, 3755, 44, 495, 154],
 [41, 879, 726, 214, 16, 144, 1236],
 [749, 11, 550, 74, 284, 124, 942, 304, 1482, 54],
 [2408, 1405, 27, 23],
 [288, 1826, 890, 3293, 305, 3701, 14],
 [280, 12, 221, 210, 164, 1, 1218, 7333, 1483, 344, 1352, 2173, 7125],
 [226, 33, 5, 48],
 [8664, 3168, 333, 76, 10],
 [1, 97],
 [15, 27, 53, 54, 1311, 125, 70, 9],
 [37, 25, 188, 19],
 [2128, 227, 72, 433, 1926, 1],
 [741, 56, 468, 637, 156, 476, 109, 2077],
 [41, 1095, 62, 1494, 4050],
 [101, 4, 5284, 53, 352, 914, 53, 697, 108, 7, 1097],
 [9],
 [3375, 1762, 3717, 7, 2306, 807, 8586, 473, 870, 870, 1025],
 [166, 513, 85, 1549],
 [27, 171, 1716, 697],
 [1889, 128, 485],
 [323, 1272, 68, 889, 1296, 814, 4, 296],
 [2739, 6517, 31, 21, 25],
 [588, 3638],
 [861, 375, 128],
 [7331, 4519, 698, 1154, 7567],
 [244, 4759, 15, 88, 44, 2032, 777, 5, 4627, 299],
 [101, 258, 93, 1255],
 [899, 12, 314, 355, 379, 84],
 [11, 1552, 6114, 1483, 1042, 167, 430, 7, 207],
 [555, 166],
 [37, 25, 19],
 [6407, 5553, 1470, 65, 3264, 6451, 91, 91, 574, 843, 6817, 6817],
 [547, 1803, 4112],
 [70, 1897, 75, 1495, 185, 46],
 [3233, 485, 1, 597],
 [1627, 199, 7039, 416, 3438, 55, 193, 1, 4, 850],
 [80, 7403, 425, 243, 343, 51, 14, 1117],
 [433, 424, 162, 221, 29, 64, 1001, 99],
 [1379, 568, 608, 92, 1082, 19, 1379],
 [519, 95, 39, 857, 728, 6431, 3580, 2466],
 [161, 234, 3, 11, 32],
 [1775, 3199, 58, 119, 301, 5091, 586],
 [8, 1281, 12, 1, 129, 1454],
 [662],
 [117, 1973, 281],
 [1565, 3821, 83, 2215, 1653, 5187, 16, 496, 10, 187],
 [71, 569, 57, 44, 6645, 939, 209, 280],
 [595],
 [296, 656, 3184, 179, 1220, 3, 463, 334],
 [2081, 1398, 703, 77, 2082, 1173, 224, 6868, 2897],
 [393, 99, 82, 5, 4, 84],
 [38, 167, 311, 59, 553],
 [7, 774, 844, 2301, 679, 205, 3106, 7132, 523],
 [1151, 117, 1342],
 [2333, 3842, 2651, 2334, 3161],
 [7358, 2, 1158, 11, 339, 524, 47],
 [1423, 8705, 492, 939, 95, 974],
 [1471, 865, 101],
 [228, 246, 1748, 1618, 794, 1058, 36],
 [9, 98, 1442, 35, 1021, 15, 1707, 255],
 [26, 447, 95, 106, 102, 666, 2023, 2476, 14],
 [378, 6824, 378, 335, 2, 50, 12],
 [56, 87, 215],
 [531, 5, 806, 407, 4454, 75, 6, 4741, 4539],
 [609, 123, 304, 2515, 10],
 [2200, 7903, 222, 162, 13, 106],
 [6, 41, 469, 211, 435],
 [1340, 56, 114, 234, 3, 610, 1953, 2, 518],
 [115, 1716, 166, 785, 418, 61, 1145, 4952, 363],
 [2539, 496, 634, 7785, 3458, 608, 819, 3458],
 [2387, 41, 140, 87],
 [304, 570, 187, 74, 345, 518],
 [74, 56, 223, 31, 169, 449, 2064, 3063, 3638, 4373, 1298, 970],
 [26, 138, 211, 1301, 309, 120, 44, 15, 1112, 183, 932, 2860, 140, 4],
 [74, 2041, 238, 23, 251, 1234],
 [5, 108, 2498, 876, 260],
 [1488, 1940, 4],
 [68, 82, 21, 6],
 [8477, 139],
 [3336, 368, 134, 203],
 [81, 25, 19, 2915],
 [2913, 2914, 243, 263, 448, 664],
 [700, 366, 502, 2, 7472],
 [37, 25, 19, 92],
 [151, 472, 360, 87, 953, 327, 44],
 [12, 127, 72, 28, 394, 874, 1191],
 [248, 12, 405, 405, 273, 3683, 860, 56, 33, 68, 219],
 [6],
 [27, 5, 68, 151, 142],
 [402, 1313, 4183, 297, 167, 794, 51, 448, 159, 132, 228, 571],
 [2744, 32, 523, 3, 2679, 2679, 204, 107, 1476, 168],
 [200, 117, 27, 12, 138, 324, 16, 4011, 1],
 [201, 399, 8199, 250, 153, 906],
 [149, 25, 46, 30],
 [669, 4],
 [2297, 119, 540, 2306, 1078, 18, 3604],
 [66, 5093, 210, 3],
 [384, 8631, 1105, 992, 32, 145, 384, 1819],
 [176, 6575, 3554, 3416, 783, 132, 228, 1797, 674, 614, 2964],
 [295, 119, 186, 357, 82, 1352],
 [3149, 101, 149, 843, 1, 1333, 344, 1483, 167, 1818],
 [30, 1713, 284, 23],
 [43, 40, 58, 385, 3499],
 [9, 546, 1357, 1216],
 [1318, 3147, 721, 7628, 136, 60, 436],
 [438, 98, 192],
 [403, 927, 1334, 3668],
 [45, 287, 7648, 6127, 1493, 644, 83, 1990, 1173, 644, 176, 2678, 1363, 8070],
 [17, 22, 179, 4783, 51, 100, 736, 14, 1113, 34],
 [2147, 322, 59, 265, 91, 528, 112, 371, 2776],
 [1938, 794, 2307, 282, 235, 266, 5638, 5639, 5640],
 [749, 1, 75, 1697, 467],
 [464, 71, 1449, 93, 154, 608, 59, 1622, 553, 6475, 775, 305, 23, 53],
 [2780, 1744, 185, 1102, 286, 29, 194, 3556, 185, 53, 5246],
 [1256, 535, 2929, 1, 467, 2770],
 [52, 744, 1191, 3248, 2070, 1456],
 [1839, 1303, 592],
 [686, 287, 1314, 122, 4522, 662, 7835, 351],
 [4687, 33, 5, 777, 126, 114, 777, 38, 114],
 [278, 2884, 409],
 [680, 866, 2592, 311, 1378, 1206, 165],
 [8, 63, 2, 252, 315, 47],
 [1620, 68, 120, 545, 4452, 217, 10, 684],
 [1263, 880, 993, 1441, 5, 358, 6578],
 [2276, 813, 98, 90, 1138, 1246],
 [107, 2070, 426, 168, 2121],
 [531, 3, 416, 29, 169, 449, 527, 2, 38, 10],
 [100, 95, 257, 186, 91, 2756, 3059, 73, 370],
 [766, 30, 10],
 [9, 149, 8181, 452, 842, 21, 37, 84],
 [74, 2135, 269, 239, 69, 815, 11, 3, 138],
 [782, 46, 8, 63, 72],
 [250, 3059, 132, 4284, 321, 2629, 225],
 [226, 834, 12, 216, 26, 85, 169, 195, 668],
 [342, 142, 1499, 53, 1269, 31],
 [1065, 68, 7, 40, 19, 53],
 [1742, 14, 7035],
 [1062, 182, 301, 66, 7, 40, 2],
 [197, 2817],
 [70, 632, 37, 683],
 [568, 179, 10],
 [4311, 12, 168, 311, 115, 24, 355, 881],
 [695, 883, 33, 3, 68, 1, 352, 352, 694],
 [750, 531, 244, 953, 198],
 [1961, 241],
 [202, 7, 88, 397, 266, 38],
 [3657, 809, 79, 236, 47, 110],
 [1429, 66],
 [323, 368, 2124, 1545, 166, 513],
 [33, 3, 2520, 758, 50, 3006, 104],
 [385, 8, 63, 47, 2, 252],
 [3, 1003, 1],
 [1034, 692, 433],
 [37, 25, 19, 3523],
 [17, 455, 36, 35, 410, 308, 51, 14, 159, 22, 2544, 17, 22, 319],
 [133, 663, 136, 28],
 [1826, 1601, 120, 338, 78, 140, 60, 185],
 [2657, 2640, 434, 2460, 5223, 919],
 [2114, 3, 154, 208, 427, 1248, 1213, 208],
 [9, 25, 887, 560, 32, 13, 3758, 80, 199, 5287, 84],
 [575, 1774, 35, 249, 34],
 [8, 63, 35, 1453, 2, 178],
 [1250, 1929, 2781, 1115],
 [2142, 142, 596, 201, 596, 97, 10, 1001, 414, 723],
 [630, 1082, 30, 1272, 13, 89],
 [1795, 710, 3229, 388, 1, 212],
 [52, 120, 1375, 6626, 1766, 210, 225, 1077, 1016, 8810, 713, 416],
 [814, 1141, 7676, 831, 177, 3010, 298, 1244, 154, 1494],
 [1815, 701, 23, 68, 4552],
 [41, 56, 373, 168, 234, 5, 13],
 [568, 1, 898],
 [425, 1514, 235, 354],
 [231, 39, 2994, 302, 257, 7990],
 [387, 347, 630, 107, 413, 596, 347, 645],
 [655, 1310, 38, 2320, 2253, 96, 59],
 [93, 154, 379, 65, 1111],
 [369, 151, 472, 360, 233, 752, 50, 283, 3262, 6015, 46, 406, 302, 233],
 [213, 12, 21, 84],
 [1656, 33, 192, 24, 1, 1],
 [1409, 12, 90, 114, 234, 192, 215, 190, 40],
 [2528, 119, 105, 118],
 [296, 74, 27, 149, 326, 46, 60, 4011, 311, 185, 8],
 [193, 55],
 [161, 840, 266, 1945, 4312, 7, 39, 38, 266, 266, 123],
 [17, 455, 36, 13, 22, 43, 24, 220, 356, 1014, 5950, 20, 526, 6904],
 [841, 19, 49, 66, 227, 770],
 [71, 3, 694, 3234, 990, 62, 3, 523, 5, 7624],
 [601, 1263, 4, 94, 46],
 [1233, 4049, 285, 134, 88, 1770, 867, 6422, 801, 1464, 2903],
 [1525, 871, 4, 489, 353],
 [8493, 391, 187, 209, 1504, 1763],
 [534, 8, 63, 1571, 47, 180, 1090, 2, 252],
 [95, 926, 728, 150, 39, 1127, 591, 933, 1061, 1421, 142, 1419, 1222, 2071],
 [1674],
 [1583, 224, 183, 61, 3221, 2066, 246, 3259],
 [5108, 5, 2],
 [427, 3162, 597, 332, 42, 1, 6, 1974],
 [393, 56, 1, 54, 50, 69],
 [747, 102, 807, 2226, 1073, 18, 1617, 233],
 [2646, 131, 332, 40],
 [289, 853, 21, 26, 167, 409, 1352],
 [398, 1280, 3391, 91, 175, 32, 116, 1218, 85],
 [17, 22, 163, 89, 35, 43, 842, 93, 1778],
 [187, 399, 390, 2073, 282, 606, 5062, 762],
 [1702, 2718, 8574, 2045, 2019, 230, 4153, 365, 281, 3691, 167],
 [45, 1884, 92, 1098, 409, 1285, 600, 274, 103, 554, 579, 1896],
 [247, 215, 405, 191, 13, 31],
 [15, 605, 181, 558, 8, 392, 348, 623, 2818, 8336],
 [5757, 71, 1104, 3, 183, 37, 413, 1880, 501, 1210, 1473],
 [475, 53, 390, 29],
 [52, 891, 6622, 843],
 [6, 2035, 38, 93, 1254, 39, 2748, 570, 2593],
 [7, 555, 166, 726, 242, 1232, 1466, 692, 655, 712, 771, 1056, 9, 6939, 202],
 [56, 416, 12],
 [382, 128, 485],
 [30, 49, 82],
 [755, 271, 238, 149, 454, 881, 682, 28, 2103],
 [45, 7329, 132, 2237, 7329, 326, 8285, 6496],
 [697, 1282, 162, 3501, 31],
 [1490, 8699, 311, 57, 4384, 77, 294, 260, 208],
 [3048, 74, 32, 1, 65, 1018, 10, 89, 388, 38, 368, 151, 205],
 [121, 393, 7494, 1906],
 [798, 1369, 8132, 2420, 8],
 [88, 29],
 [6, 24, 600, 42, 69, 1777, 37, 353, 64, 25, 2337],
 [3, 2, 74, 2239, 92, 128, 118, 2239, 563],
 [15, 3138, 400, 14, 153, 7388, 739, 7389, 2310, 7390, 3987],
 [66, 446, 2, 748, 286, 21],
 [2333, 351, 3179, 3, 40, 606, 73, 2334, 3161],
 [665, 54, 997, 102, 386, 182, 2346, 158, 242, 1028],
 [12, 545, 245, 2, 124, 937],
 [1791, 808, 1439, 544, 54, 2587, 573],
 [1, 806, 1870, 2298, 269, 493, 1683, 189, 2921, 160, 1508, 1077, 23, 2874],
 [869, 24, 464, 2971, 143, 495, 99],
 [506, 97],
 [41, 6240, 626, 2317, 6, 1587, 407, 11, 258],
 [9, 175, 144, 1234, 2472, 390, 501, 69],
 [378, 66, 127, 72],
 [133, 1816, 50, 76, 36, 858, 1653, 7038, 289, 1097, 2336, 613],
 [41, 396, 144, 11, 6, 21, 390, 1213],
 [229, 761, 106, 1018, 105, 1169],
 [113, 646, 615, 8339, 979, 160, 1110, 403, 587, 659],
 [6, 1477, 120, 77],
 [93, 126, 452, 43, 1406, 51, 100, 736, 1934, 7420],
 [16, 189],
 [219, 74],
 [1066, 28, 477, 1058, 5, 1348, 114, 865, 29],
 [161, 3671, 337, 266, 1371, 42, 326, 44, 44, 326, 1164],
 [2878, 195, 687],
 [1333, 7548, 1958, 352],
 [1151, 3, 353, 21, 162, 1637, 357, 338, 133, 829, 127, 120],
 [1319, 130, 46, 20, 32, 7035, 703, 56],
 [226, 8628, 59, 61],
 [515, 548, 1075, 1125, 321, 15, 1707, 649, 467, 120, 15, 204],
 [1319, 54, 74, 1629],
 [1859, 121, 363, 768],
 [4468, 5950, 132, 1121, 1407, 163, 764, 288, 65, 2002, 13],
 [14, 1322, 100, 43, 1445, 329, 100, 846, 7290, 3696],
 [468, 78, 341, 60, 4, 46],
 [45, 4948, 206, 103, 1313, 576, 1653, 333, 510, 220, 1858],
 [70, 64, 7057, 299, 1, 67],
 [16, 2147, 303, 10, 187],
 [2854, 414, 2521, 784, 172, 371, 173, 268, 84, 561, 1064, 552],
 [1535, 8, 63, 1, 46],
 [45, 4948, 1310, 1667, 3708, 4063, 3614, 739, 7389, 911, 43, 3597],
 [547, 367, 475, 550],
 [17, 239, 36, 35, 22, 736, 100, 43, 204, 14, 781],
 [865, 1312, 210, 815, 545, 254, 441, 114, 1002],
 [1545, 1108, 2, 13, 300, 26, 37, 792],
 [52,
  161,
  1223,
  12,
  76,
  362,
  2634,
  1208,
  437,
  103,
  291,
  686,
  269,
  12,
  76,
  84,
  1017],
 [17, 926, 36, 449, 581, 22, 1226, 1014, 20, 526, 111, 17, 22, 319],
 [2299, 61, 1, 500, 82],
 [81, 25, 19, 150, 55],
 [3070, 1027, 1747, 2267, 334, 86],
 [239, 1115, 586, 285, 1038, 453, 2955],
 [117, 8420],
 [1, 385, 149],
 [249, 1469, 3837, 151, 127, 360, 60, 953, 370, 44, 3536, 5854],
 [4909, 1, 500],
 [404, 119, 2805, 685],
 [9, 4911, 300, 205, 397, 21, 3089, 165],
 [4, 230, 13, 214],
 [270, 1657, 204, 120, 255],
 [643, 2341, 1746, 3445, 1366, 96],
 [402, 1280, 100, 167, 2140, 1948, 2052, 4633],
 [27, 2957, 681],
 [52, 78, 2035, 480, 1751, 5266, 148, 18, 182, 11],
 [1795, 1, 1607, 63, 511, 146, 630, 1],
 [97, 2516, 465, 84],
 [4612, 2076, 915, 307, 41, 157, 1494, 3903, 42, 326, 2473, 187, 4615],
 [4, 182, 1794, 894, 1284, 235],
 [148, 4, 4854, 919],
 [293, 2598, 2540, 44, 1571, 540],
 [963, 3, 164, 110],
 [189, 89, 37, 69, 215, 27, 222],
 [3660, 2532, 4191, 315, 1990, 1421, 458, 120],
 [161, 322, 59, 265, 1451, 1883, 112, 1883],
 [874, 535, 2073, 209, 922, 65, 6704, 4259],
 [404, 610, 380, 404, 610, 7, 3, 7],
 [1086, 2253, 426, 607],
 [502, 597, 281, 480, 1830, 46, 68, 111],
 [4529, 237, 48, 1145, 13, 40, 169, 195, 469, 31, 1087],
 [2407, 31, 168, 656, 24, 1055, 3, 2394, 781, 1409, 461, 145],
 [2449, 74, 1, 2356],
 [17, 489, 36, 35, 22, 1247, 51, 14, 159, 356, 20, 282, 102, 17, 22, 319],
 [206, 103, 1313, 576, 1653, 333, 510, 220],
 [206, 103, 235, 1175, 302, 2438, 2411, 602],
 [218, 5118, 1371, 3293, 763, 58, 715, 97, 7587],
 [8019, 32, 2, 5],
 [642, 32, 3, 1909],
 [1543, 586, 1247, 940, 1509, 32, 211, 93],
 [9],
 [429, 365, 885, 128, 1418, 867, 5710, 2486, 1120],
 [703, 176, 945, 266, 44, 1479, 504, 34],
 [423, 629, 65, 640],
 [393, 27, 271, 50, 10],
 [36],
 [342, 1134],
 [172, 371, 88, 473, 298, 1859, 148],
 [3611, 16, 217, 10, 126],
 [575, 102, 329, 50, 4391, 249, 51, 846, 1113, 36, 7977],
 [15, 605, 181, 558, 8, 392, 348, 532, 310, 1478, 4915],
 [1656, 696],
 [7283, 2403, 528],
 [669, 68, 67, 18, 61],
 [122, 1272, 7422, 2160, 83, 372, 1371, 406],
 [133, 829, 4, 13, 1006, 4119, 1623, 719],
 [515, 2214, 1095, 2474, 383, 6253, 84, 106, 184, 162, 599, 1017],
 [9, 533, 21, 84, 887],
 [4529, 5584, 1, 7733, 2, 436, 179, 328, 126],
 [1, 138, 227, 72, 1813],
 [3889, 3890, 536, 376, 637, 336, 69],
 [351, 242, 16, 20, 4965, 2115, 733, 2709, 1180, 20, 93],
 [795, 3482, 838, 61, 5, 1276, 1902],
 [1849, 1907, 805, 303],
 [41],
 [41, 2450, 3006, 467, 10, 234, 3, 73, 530],
 [2778, 66, 7613, 196, 1488, 590],
 [2449, 628, 446, 2, 27],
 [8, 63, 501, 23],
 [538, 23, 12, 1, 58, 2452, 193, 5737, 1886, 538],
 [804, 146, 4, 275, 307, 440, 2288, 119, 390],
 [10, 50, 486, 694],
 [113, 8173],
 [601, 2393, 21, 869, 297, 2148, 78, 121, 105],
 [263, 5, 69, 3235, 637, 495, 321, 60],
 [9, 197, 138, 234, 3, 441, 304],
 [45, 1395, 116, 98, 457, 80, 302, 753, 801],
 [777, 615, 332, 40, 292],
 [4659, 7834, 154, 106, 2414, 1144, 3910, 642],
 [9, 254, 5, 69, 84, 1116, 124, 2],
 [16, 23],
 [218, 2994, 622, 20, 675, 1587, 743, 2510, 1301, 626, 7857],
 [340, 93, 603, 180, 603, 49, 499, 388],
 [45, 6492, 274, 103, 2683, 5741, 2491, 385, 827, 1929, 2599, 1865, 1259],
 [743, 115, 27, 1579, 54, 301, 73, 237],
 [5793, 8897, 48, 1501, 311, 23],
 [804, 1542, 74, 1276, 2044, 1415, 139, 1556, 3, 120],
 [1441, 11],
 [583, 188, 415, 98, 3116, 1813],
 [917, 395, 29, 24, 225, 445],
 [17, 22, 776, 410, 2248, 1533, 51, 14, 1322, 34],
 [514, 221],
 [185, 367, 8173, 269, 1],
 [1655, 1346],
 [904, 170, 127, 72, 749, 62, 803, 509, 13],
 [5582, 3280, 1635, 151, 346, 406, 691],
 [2409, 8900, 881],
 [1275, 1863, 6015, 600, 6583, 1999, 163, 4235],
 [330, 2393, 458, 120, 2872, 1828, 246, 6292],
 [466, 780, 207, 32, 13, 1442, 2016],
 [2744, 1, 99, 82],
 [456, 106, 329, 703, 186, 172],
 [1543, 962, 65, 628, 203, 98, 229, 175, 182, 8704, 346, 142],
 [2238, 43, 914, 3061, 300, 77, 96, 190, 172, 163, 1222],
 [2781, 4157, 1529, 725, 658, 1809, 4157, 1250, 1887],
 [515, 7050, 1292, 672, 184, 6904, 3085, 128, 311, 49, 162, 185, 23, 2153],
 [619, 2520, 1075, 114, 436, 225, 1, 13, 126],
 [423, 629, 1626, 3021],
 [575, 926, 702, 338, 249, 228, 2091],
 [2150, 44, 4931, 1038, 1095, 277, 698, 3247, 1352, 447, 108, 484, 213, 1555],
 [93, 3, 1826, 38, 3, 782],
 [1576, 322, 59, 265, 337, 91, 91, 447],
 [436, 172, 6, 1991, 325, 702, 233, 341, 151, 691, 466, 297, 1158, 172],
 [37, 25, 19, 500],
 [584, 2, 301, 2382, 75, 69, 593, 120, 32, 1193],
 [830, 66, 72, 138, 169, 16],
 [57, 7, 1609, 843, 3629],
 [228, 1867, 14, 2418, 829, 571, 674, 614, 51, 448, 159],
 [9, 245, 1, 1942, 345, 80, 84],
 [760, 1938, 2379, 5703, 51, 2949, 15, 3572, 1189],
 [5852, 1426, 287, 163, 879, 166, 543, 83, 1199, 9],
 [3223, 1216, 40],
 [7, 739, 783, 1530, 1448, 643, 2889, 224, 2318, 598, 3202],
 [283, 985, 620],
 [2098, 2718, 950, 2340, 7789, 96, 2519],
 [2, 5349, 85, 932, 8034],
 [1543, 488, 446, 2, 134, 184, 184, 3430, 3294, 151, 71],
 [9, 21, 144],
 [552, 918, 2175, 59, 557, 2907, 92, 1253, 154, 186, 8179],
 [369, 500],
 [48, 358, 33, 26],
 [9, 4307, 6312, 1988],
 [2841, 1178, 44, 1463, 713, 150, 2370, 230, 2267, 257, 3128, 1112, 277],
 [997, 27, 1164, 4, 324, 2460, 1123],
 [31, 562, 564, 396, 667, 458, 443, 214, 102, 255, 50],
 [100, 43, 559, 361, 871, 1087, 576, 565, 3542, 512],
 [81, 25, 19, 92],
 [4150, 419, 18, 397, 992, 574],
 [504, 24, 210, 256],
 [1741, 696, 291, 1, 112],
 [1046, 2726, 777, 568, 353, 168, 3361],
 [475, 3667, 90, 2714, 29, 817, 163, 204, 13],
 [974, 12, 596, 377],
 [595],
 [444, 2935, 334, 3022],
 [109, 304, 4550, 1517, 1170, 1070, 98, 722, 121],
 [15, 605, 181, 558, 8, 392, 348, 371, 296, 4325, 6904, 30, 2855, 3889],
 [5686, 5232, 7, 631, 40, 364],
 [5218, 1553, 233, 37, 25, 236, 4, 418, 154, 2884],
 [364, 1993, 1372, 6744, 1197, 1047, 173, 1594, 371, 561],
 [17, 239, 36, 35, 22, 316, 131, 14, 636, 1980, 100, 43, 17, 22, 319],
 [724, 489, 1240],
 [475, 1913, 789, 2320, 2],
 [128],
 [9, 514, 3660, 245, 141, 365],
 [503, 59, 11, 233],
 [258, 12, 216, 24, 44, 284, 24],
 [2, 1104, 470, 44],
 [910, 183, 1942, 223, 64, 99, 5, 185],
 [2180, 322, 59, 265, 337, 91, 447, 91, 112, 447],
 [8339, 12, 158, 48, 142],
 [292, 979, 104, 436],
 [7, 2101, 324, 699, 1635, 2423],
 [70, 25],
 [5452, 607, 8718, 37, 1090, 20],
 [773, 128],
 [623, 23, 69, 4, 1721, 2224, 113, 568],
 [547, 6824, 1164, 76, 2136, 388, 12, 136],
 [186, 734, 599, 125, 218, 58, 317, 2154, 1048],
 [163, 12, 7, 1187],
 [2505, 1694, 835, 7336, 4307, 155, 803, 789],
 [5938, 254, 662, 13],
 [387, 12, 301, 68, 2],
 [428, 205, 120, 12, 120, 2598, 165, 6972, 146, 77, 7, 2191, 594],
 [679, 33, 5]]
In [54]:
print("The encoding for document\n", X_train[1:2],"\n is: ", train_sequences[1])
The encoding for document
 3584    [strong, buying, close]
Name: Text Without Punc & Stopwords, dtype: object 
 is:  [53, 57, 21]
In [55]:
# Added padding to training and testing
padded_train = pad_sequences(train_sequences, maxlen = 29, padding = 'post', truncating = 'post')
padded_test = pad_sequences(test_sequences, maxlen = 29, truncating = 'post')
In [56]:
for i, doc in enumerate(padded_train[:3]):
     print("The padded encoding for document:", i+1," is:", doc)
The padded encoding for document: 1  is: [3720 3721  295  672  135 2529 1909 3722    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0]
The padded encoding for document: 2  is: [53 57 21  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0]
The padded encoding for document: 3  is: [1545   89 1910 1546   23    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0]
In [57]:
# Converted the data to categorical 2D representation
y_train_cat = to_categorical(y_train, 2)
y_test_cat = to_categorical(y_test, 2)
In [58]:
y_train_cat.shape
Out[58]:
(5211, 2)
In [59]:
y_test_cat.shape
Out[59]:
(580, 2)
In [60]:
y_train_cat
Out[60]:
array([[1., 0.],
       [0., 1.],
       [0., 1.],
       ...,
       [0., 1.],
       [0., 1.],
       [0., 1.]], dtype=float32)
In [61]:
# Added padding to training and testing
padded_train = pad_sequences(train_sequences, maxlen = 15, padding = 'post', truncating = 'post')
padded_test = pad_sequences(test_sequences, maxlen = 15, truncating = 'post')
In [62]:
for i, doc in enumerate(padded_train[:3]):
     print("The padded encoding for document:", i+1," is:", doc)
The padded encoding for document: 1  is: [3720 3721  295  672  135 2529 1909 3722    0    0    0    0    0    0
    0]
The padded encoding for document: 2  is: [53 57 21  0  0  0  0  0  0  0  0  0  0  0  0]
The padded encoding for document: 3  is: [1545   89 1910 1546   23    0    0    0    0    0    0    0    0    0
    0]
In [63]:
# Converted the data to categorical 2D representation
y_train_cat = to_categorical(y_train, 2)
y_test_cat = to_categorical(y_test, 2)
In [64]:
y_train_cat.shape
Out[64]:
(5211, 2)
In [65]:
y_test_cat.shape
Out[65]:
(580, 2)
In [66]:
y_train_cat
Out[66]:
array([[1., 0.],
       [0., 1.],
       [0., 1.],
       ...,
       [0., 1.],
       [0., 1.],
       [0., 1.]], dtype=float32)

Understanding the Theory and Intuition behind Recurrent Neural Networks and Long Short Term Memory Networks (LSTM)

Introduction to Recurrent Neural Networks (RNN)

  • Feedforward Neural Networks (vanilla networks) map a fixed size input (such as image) to a fixed size output (classes or probabilities).
  • A drawback in Feedforward networks is that they do not have any time dependency or memory effect.
  • A RNN is a type of ANN that is designed to take temporal dimension into consideration by having a memory (internal state) (feedback loop).
  • A RNN contains a temporal loop in which the hidden layer not only gives an output but it feeds itself as well.
  • An extra dimension is added which is time!
  • RNN can recall what happened in the previous time stamp so it works great with sequence of text.

Long Short Term Memory Networks

  • LSTM contains gates that can allow or block information from passing by.
  • Gates consist of a sigmoid neural net layer along with a pointwise multiplication operation.
  • Sigmoid output ranges from 0 to 1:
    • 0 = Don't allow any data to flow
    • 1 = Allow everything to flow!

Build a Custom-Based Deep Neural Network to Perform Sentiment Analysis

Embedding Layer

  • Embedding layers learn low-dimensional continuous representation of discrete input variables.
  • For example, let say we have 100000 unique values in our data and want to train the model with these data. Though we can use these as such, it would require more resources to train. With embedding layer, you can specify the number of low-dimensional feature that you would need to represent the input data, in this case let's take the value to be 200.
  • Now, what happens is embedding layer learns the way to represent 100000 variables with 200 variables, similar to Principal Component Analysis (PCA) or Autoencoder.
  • This in-turn helps the subsequent layers to learn more effectively.
In [67]:
# Sequential Model
model = Sequential()

# embedding layer
model.add(Embedding(total_words, output_dim = 512))

# Bi-Directional RNN and LSTM
model.add(LSTM(256))

# Dense layers
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.3))
model.add(Dense(2,activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['acc'])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, None, 512)         4857344   
_________________________________________________________________
lstm (LSTM)                  (None, 256)               787456    
_________________________________________________________________
dense (Dense)                (None, 128)               32896     
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 258       
=================================================================
Total params: 5,677,954
Trainable params: 5,677,954
Non-trainable params: 0
_________________________________________________________________
In [68]:
# train the model
model.fit(padded_train, y_train_cat, batch_size = 32, validation_split = 0.2, epochs = 2)
Epoch 1/2
131/131 [==============================] - 15s 104ms/step - loss: 0.6348 - acc: 0.6452 - val_loss: 0.5346 - val_acc: 0.7402
Epoch 2/2
131/131 [==============================] - 13s 102ms/step - loss: 0.3146 - acc: 0.8836 - val_loss: 0.5913 - val_acc: 0.7622
Out[68]:
<tensorflow.python.keras.callbacks.History at 0x269061e6d48>

Trained the Model using Different Embedding Output Dimension

In [69]:
#model = Sequential()

# embedding layer
#model.add(Embedding(total_words, output_dim = 256))

# Bi-Directional RNN and LSTM
#model.add(Bidirectional(LSTM(128)))

# Dense layers
#model.add(Dense(128, activation = 'relu'))
#model.add(Dense(1,activation = 'sigmoid'))
#model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['acc'])
#model.summary()

Assessed Trained Model Performance

In [70]:
# Made prediction
pred = model.predict(padded_test)
In [71]:
# Made prediction
prediction = []
for i in pred:
  prediction.append(np.argmax(i))
In [72]:
# list containing original values
original = []
for i in y_test_cat:
  original.append(np.argmax(i))
In [73]:
# Accuracy score on text data
accuracy = accuracy_score(original, prediction)
accuracy
Out[73]:
0.7396551724137931
In [74]:
# Plotted the confusion matrix
cm = confusion_matrix(original, prediction)
sns.heatmap(cm, annot = True)
Out[74]:
<matplotlib.axes._subplots.AxesSubplot at 0x2690c612808>
In [75]:
# Used pipeline from transformer to perform specific task. 
# Mentioned sentiment analysis as task and passed in the string to it, to get the results
# We can specify tasks like topic modeling, Q and A, text summarization here.

#nlp = pipeline('sentiment-analysis')

# Made prediction on the test data
#pred = nlp(list(X_test))

# Since predicted value is a dictionary, get the label from the dict
#prediction = []
#for i in pred:
#  prediction.append(i['label'])

# print the final results
#for i in range(len(prediction[:3])):
#  print("\n\nNews :\n\n", df[df.combined == X_test.values[i]].Text.item(), "\n\nOriginal value :\n\n",
#      category[df[df.combined == X_test.values[i]].Sentiment.item()], "\n\nPredicted value :\n\n", prediction[i], "\n\n\n")